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Preface to the Second Edition 


I approached revising Topics in Algebra with a certain amount of 
trepidation. On the whole, I was satisfied with the first edition and did 
not want to tamper with it. However, there were certain changes I felt 
should be made, changes which would not affect the general style or 
content, but which would make the book a little more complete. I 
hope that I have achieved this objective in the present version. 

For the most part, the major changes take place in the chapter on 
group theory. When the first edition was written it was fairly un¬ 
common for a student learning abstract algebra to have had any 
previous exposure to linear algebra. Nowadays quite the opposite is 
true; many students, perhaps even a majority, have learned something 
about 2x2 matrices at this stage. Thus I felt free here to draw on 
2x2 matrices for examples and problems. These parts, which 
depend on some knowledge of linear algebra, are indicated with a #. 

In the chapter on groups I have largely expanded one section, that 
on Sylow’s theorem, and added two others, one on direct products and 
one on the structure of finite abelian groups. 

In the previous treatment of Sylow’s theorem, only the existence of a 
Sylow subgroup was shown. This was done following the proof of 
Wielandt. The conjugacy of the Sylow subgroups and their number 
were developed in a series of exercises, but not in the text proper. 
Now all the parts of Sylow’s theorem are done in the text material. 
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In addition to the proof previously given for the existence, two other 
proofs of existence are carried out. One could accuse me of overkill 
at this point, probably rightfully so. The fact of the matter is that Sylow’s 
theorem is important, that each proof illustrates a different aspect of group 
theory and, above all, that I love Sylow’s theorem. The proof of the con- 
jugacy and number of Sylow subgroups exploits double cosets. A by-product 
of this development is that a means is given for finding Sylow subgroups in a 
large set of symmetric groups. 

For some mysterious reason known only to myself, I had omitted direct 
products in the first edition. Why is beyond me. The material is easy, 
straightforward, and important. This lacuna is now filled in the section 
treating direct products. With this in hand, I go on in the next section to 
prove the decomposition of a finite abelian group as a direct product of 
cyclic groups and also prove the uniqueness of the invariants associated with 
this decomposition. In point of fact, this decomposition was already in the 
first edition, at the end of the chapter on vector spaces, as a consequence of 
the structure of finitely generated modules over Euclidean rings. However, 
the case of a finite group is of great importance by itself; the section on finite 
abelian groups underlines this importance. Its presence in the chapter on 
groups, an early chapter, makes it more likely that it will be taught. 

One other entire section has been added at the end of the chapter on field 
theory. I felt that the student should see an explicit polynomial over an 
explicit field whose Galois group was the symmetric group of degree 5, hence 
one whose roots could not be expressed by radicals. In order to do so, a 
theorem is first proved which gives a criterion that an irreducible poly¬ 
nomial of degree p, p a prime, over the rational field have S p as its Galois 
group. As an application of this criterion, an irreducible polynomial of 
degree 5 is given, over the rational field, whose Galois group is the symmetric 
group of degree 5. 

There are several other additions. More than 150 new problems are to be 
found here. They are of varying degrees of difficulty. Many are routine 
and computational, many are very difficult. Furthermore, some inter- 
polatory remarks are made about problems that have given readers a great 
deal of difficulty. Some paragraphs have been inserted, others rewritten, at 
places where the writing had previously been obscure or too terse. 

Above I have described what I have added. What gave me greater 
difficulty about the revision was, perhaps, that which I have not added. I 
debated for a long time with myself whether or not to add a chapter on 
category theory and some elementary functors, whether or not to enlarge the 
material on modules substantially. After a great deal of thought and soul- 
searching, I decided not to do so. The book, as stands, has a certain concrete¬ 
ness about it with which this new material would not blend. It could be 
made to blend, but this would require a complete reworking of the material 
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of the book and a complete change in its philosophy—something I did not 
want to do. A mere addition of this new material, as an adjunct with no 
applications and no discernible goals, would have violated my guiding 
principle that all matters discussed should lead to some clearly defined 
objectives, to some highlight, to some exciting theorems. Thus I decided to 
omit the additional topics. 

Many people wrote me about the first edition pointing out typographical 
mistakes or making suggestions on how to improve the book. I should like to 
take this opportunity to thank them for their help and kindness. 




Preface to the First Edition 


The idea to write this book, and more important the desire to do so, is 
a direct outgrowth of a course I gave in the academic year 1959-1960 at 
Cornell University. The class taking this course consisted, in large part, 
of the most gifted sophomores in mathematics at Cornell. It was my 
desire to experiment by presenting to them material a little beyond that 
which is usually taught in algebra at the junior-senior level. 

I have aimed this book to be, both in content and degree of sophisti¬ 
cation, about halfway between two great classics, A Survey of Modern 
Algebra, by Birkhoff and MacLane, and Modern Algebra, by Van der 
Waerden. 

The last few years have seen marked changes in the instruction given 
in mathematics at the American universities. This change is most 
notable at the upper undergraduate and beginning graduate levels. 
Topics that a few years ago were considered proper subject matter for 
semiadvanced graduate courses in algebra have filtered down to, and 
are being taught in, the very first course in abstract algebra. Convinced 
that this filtration will continue and will become intensified in the next 
few years, I have put into this book, which is designed to be used as the 
student’s first introduction to algebra, material which hitherto has been 
considered a little advanced for that stage of the game. 

There is always a great danger when treating abstract ideas to intro¬ 
duce them too suddenly and without a sufficient base of examples to 
render them credible or natural. In order to try to mitigate this, I have 
tried to motivate the concepts beforehand and to illustrate them in con¬ 
crete situations. One of the most telling proofs of the worth of an abstract 
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concept is what it, and the results about it, tells us in familiar situations. In 
almost every chapter an attempt is made to bring out the significance of the 
general results by applying them to particular problems. For instance, in the 
chapter on rings, the two-square theorem of Fermat is exhibited as a direct 
consequence of the theory developed for Euclidean rings. 

The subject matter chosen for discussion has been picked not only because 
it has become standard to present it at this level or because it is important in 
the whole general development but also with an eye to this “concreteness.” 
For this reason I chose to omit the Jordan-Holder theorem, which certainly 
could have easily been included in the results derived about groups. How¬ 
ever, to appreciate this result for its own sake requires a great deal of hind¬ 
sight and to see it used effectively would require too great a digression. True, 
one could develop the whole theory of dimension of a vector space as one of 
its corollaries, but, for the first time around, this seems like a much too fancy 
and unnatural approach to something so basic and down-to-earth. Likewise, 
there is no mention of tensor products or related constructions. There is so 
much time and opportunity to become abstract; why rush it at the 
beginning? 

A word about the problems. There are a great number of them. It would 
be an extraordinary student indeed who could solve them all. Some are 
present merely to complete proofs in the text material, others to illustrate 
and to give practice in the results obtained. Many are introduced not so 
much to be solved as to be tackled. The value of a problem is not so much 
in coming up with the answer as in the ideas and attempted ideas it forces 
on the would-be solver. Others are included in anticipation of material to 
be developed later, the hope and rationale for this being both to lay the 
groundwork for the subsequent theory and also to make more natural ideas, 
definitions, and arguments as they are introduced. Several problems appear 
more than once. Problems that for some reason or other seem difficult to me 
are often starred (sometimes with two stars). However, even here there will 
be no agreement among mathematicians; many will feel that some unstarred 
problems should be starred and vice versa. 

Naturally, I am indebted to many people for suggestions, comments and 
criticisms. To mention just a few of these: Charles Curtis, Marshall Hall, 
Nathan Jacobson, Arthur Mattuck, and Maxwell Rosenlicht. I owe a great 
deal to Daniel Gorenstein and Irving Kaplansky for the numerous con¬ 
versations we have had about the book, its material and its approach. 
Above all, I thank George Seligman for the many incisive suggestions and 
remarks that he has made about the presentation both as to its style and to 
its content. I am also grateful to Francis McNary of the staff of Ginn and 
Company for his help and cooperation. Finally, I should like to express my 
thanks to the John Simon Guggenheim Memorial Foundation; this book was 
in part written with their support while the author was in Rome as a 
Guggenheim Fellow. 
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Preliminary Notions 


One of the amazing features of twentieth century mathematics has 
been its recognition of the power of the abstract approach. This has 
given rise to a large body of new results and problems and has, in fact, 
led us to open up whole new areas of mathematics whose very existence 
had not even been suspected. 

In the wake of these developments has come not only a new 
mathematics but a fresh outlook, and along with this, simple new 
proofs of difficult classical results. The isolation of a problem into its 
basic essentials has often revealed for us the proper setting, in the whole 
scheme of things, of results considered to have been special and apart 
and has shown us interrelations between areas previously thought to 
have been unconnected. 

The algebra which has evolved as an outgrowth of all this is not 
only a subject with an independent life and vigor—it is one of the 
important current research areas in mathematics—but it also serves as 
the unifying thread which interlaces almost all of mathematics— 
geometry, number theory, analysis, topology, and even applied 
mathematics. 

This book is intended as an introduction to that part of mathematics 
that today goes by the name of abstract algebra. The term “abstract” 
is a highly subjective one; what is abstract to one person is very often 
concrete and down-to-earth to another, and vice versa. In relation to 
the current research activity in algebra, it could be described as 
“not too abstract”; from the point of view of someone schooled in the 
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calculus and who is seeing the present material for the first time, it may very 
well be described as “quite abstract.” 

Be that as it may, we shall concern ourselves with the introduction and 
development of some of the important algebraic systems—groups, rings, 
vector spaces, fields. An algebraic system can be described as a set of objects 
together with some operations for combining them. 

Prior to studying sets restricted in any way whatever—for instance, with 
operations—it will be necessary to consider sets in general and some notions 
about them. At the other end of the spectrum, we shall need some informa¬ 
tion about the particular set, the set of integers. It is the purpose of this 
chapter to discuss these and to derive some results about them which we can 
call upon, as the occasions arise, later in the book. 

1.1 Set Theory 

We shall not attempt a formal definition of a set nor shall we try to lay the 
groundwork for an axiomatic theory of sets. Instead we shall take the 
operational and intuitive approach that a set is some given collection of 
objects. In most of our applications we shall be dealing with rather specific 
things, and the nebulous notion of a set, in these, will emerge as something 
quite recognizable. For those whose tastes run more to the formal and 
abstract side, we can consider a set as a primitive notion which one does 
not define. 

A few remarks about notation and terminology. Given a set S we shall 
use the notation throughout a e S to read “a is an element of S.” In the same 
vein, a p S will read “a is not an element of The set A will be said to be 
a subset of the set S if every element in A is an element of S, that is, if a e A 
implies a e S. We shall write this as A c S (or, sometimes, as S 3 A), 
which may be read “ A is contained in S” (or, S contains A). This notation 
is not meant to preclude the possibility that A = S. By the way, what is 
meant by the equality of two sets? For us this will always mean that they 
contain the same elements, that is, every element which is in one is in the 
other, and vice versa. In terms of the symbol for the containing relation, the 
two sets A and B are equal, written A = B, if both A c B and B cz A. 
The standard device for proving the equality of two sets, something we shall 
be required to do often, is to demonstrate that the two opposite containing 
relations hold for them. A subset A of S will be called a proper subset of S 
if A a S but A S (A is not equal to S). 

The null set is the set having no elements; it is a subset of every set. We 
shall often describe that a set S is the null set by saying it is empty. 

One final, purely notational remark: Given a set S we shall constantly 
use the notation A = {a e S | P{a)} to read “ A is the set of all elements in 
S for which the property P holds.” For instance, if S is the set of integers 
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and if A is the subset of positive integers, then we can describe A as 
A = {a 6 S | a > 0}. Another example of this: If S is the set consisting of 
the objects (1), (2),.. ., (10), then the subset A consisting of (1), (4), (7), 
(10) could be described by A = {(?) 6 S | i = 3n 4- 1, n = 0, 1, 2, 3}. 

Given two sets we can combine them to form new sets. There is nothing 
sacred or particular about this number two; we can carry out the same pro¬ 
cedure for any number of sets, finite or infinite, and in fact we shall. We 
do so for two first because it illustrates the general construction but is not 
obscured by the additional notational difficulties. 

DEFINITION The union of the two sets A and B, written as A u B, is the 
set {x | x e A or x e B}. 

A word about the use of “or.” In ordinary English when we say that 
something is one or the other we imply that it is not both. The mathematical 
“or” is quite different, at least when we are speaking about set theory. For 
when we say that x is in A or x is in B we mean x is in at least one of A or B, and 
may be in both. 

Let us consider a few examples of the union of two sets. For any set A, 
A u A = A ; in fact, whenever B is a subset of A, A u B = A. If A is the 
set {x x , x 2 , Xj} (i.e., the set whose elements are x 1} x 2 , * 3 ) and if B is the set 
{yuJu *i}> ^en A u B = {* 1} x 2 , x 3 ,y l ,y 2 }. If A is the set of all blonde¬ 
haired people and if B is the set of all people who smoke, then A u B 
consists of all the people who either have blonde hair or smoke or both. 
Pictorially we can illustrate the union of the two sets A and B by 



Here, A is the circle on the left, B that on the right, and A u B is the shaded 
part. 

DEFINITION The intersection of the two sets A and B, written as A n B, 
is the set {x | x 6 A and x 6 B}. 

The intersection of A and B is thus the set of all elements which are both 
in A and in B. In analogy with the examples used to illustrate the union of 
two sets, let us see what the intersections are in those very examples. For 
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any set A, A n A = A; in fact, if B is any subset of A, then A n B = B. 
If A is the set {x l5 x 2 , x 3 } and B the set {y l ,y 2 , * 1 }, then A n B = (xj 
(we are supposing no y is an x). If A is the set of all blonde-haired people 
and if B is the set of all people that smoke, then A n B is the set of all 
blonde-haired people who smoke. Pictorially we can illustrate the inter¬ 
section of the two sets A and B by 



Here A is the circle on the left, B that on the right, while their intersection 
is the shaded part. 

Two sets are said to be disjoint if their intersection is empty, that is, is 
the null set. For instance, if A is the set of positive integers and B the set of 
negative integers, then A and B are disjoint. Note however that if C is the 
set of nonnegative integers and if D is the set of nonpositive integers, then 
they are not disjoint, for their intersection consists of the integer 0, and so is 
not empty. 

Before we generalize union and intersection from two sets to an arbitrary 
number of them, we should like to prove a little proposition interrelating 
union and intersection. This is the first of a whole host of such results that 
can be proved; some of these can be found in the problems at the end of this 
section. 

PROPOSITION For any three sets, A, B, C we have 

A n {B u C) = (A n B) u (A n C). 

Proof. The proof will consist of showing, to begin with, the relation 
(A n B) u (A n C) c= A n (B u C) and then the converse relation 
A n (B u C) <= (A n B) u (A n C). 

We first dispose of (A n B) u (A n C) <= A n (B u C). Because 
B c B u C, it is immediate that AnBaAn(BKjC). In a similar 
manner, A n C c= A n (B u C). Therefore 

(A n B) u (A n C) c= (A n (B u C)) u (A n (B u C)) = A n (B u C). 

Now for the other direction. Given an element x e A n (B u C), 
first of all it must be an element of A. Secondly, as an element in B u C it 
is either in B or in C. Suppose the former; then as an element both of A and 
of B, x must be in A n B. The second possibility, namely, x e C, leads us 
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to x g A n C. Thus in either eventuality x e (A n B) u (A n C), whence 
A n (B u C) c (A n B) u (A n C). 

The two opposite containing relations combine to give us the equality 
asserted in the proposition. 

We continue the discussion of sets to extend the notion of union and of 
intersection to arbitrary collections of sets. 

Given a set T we say that T serves as an index set for the family SF = {A a } 
of sets if for every a e T there exists a set of A a in the family 3F. The index 
set T can be any set, finite or infinite. Very often we use the set of non¬ 
negative integers as an index set, but, we repeat, T can be any (nonempty) 
set. 

By the union of the sets A a , where a is in T, we mean the set {x | x e A a 
for at least one a in T). We shall denote it by |J aer A a . By the intersection 
of the sets A a , where a is in T, we mean the set {x \ x e A a for every a e Tj; 
we shall denote it by f) aeT A a . The sets A a are mutually disjoint if for a ^ 

A„ n A„ is the null set. 

OL p m 

For instance, if S is the set of real numbers, and if T is the set of rational 
numbers, let, for a e T, A a = {x e S \ x > a}. It is an easy exercise to see 
that (J aeT A a = S whereas f) a6r A a is the null set. The sets A a are not 
mutually disjoint. 

DEFINITION Given the two sets A, B then the difference set, A — B, is the 
set {x e A | x $ B). 

Returning to our little pictures, if A is the circle on the left, B that on the 
right, then A — B is the shaded area. 



Note that for any set B, the set A satisfies A — [A n B) u {A — B). 
(Prove H Note further that B n (A — B) is the null set. A particular case 
of interest of the difference of two sets is when one of these is a subset of the 
other. In that case, when B is a subset of A, we call A — B the complement 
of B in A. 

We still want one more construct of two given sets A and B, their Cartesian 
product A x B. This set A x B is defined as the set of all ordered pairs 
(a, b) where a e A and b e B and where we declare the pair (a 1} b y ) to be 
equal to ( a 2 , b 2 ) if and only if a 1 = a z and b t = b 2 - 
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A few remarks about the Cartesian product. Given the two sets A and B 
we could construct the sets A x B and B x A from them. As sets these are 
distinct, yet we feel that they must be closely related. Given three sets A, 
B, C we can construct many Cartesian products from them: for instance, the 
set A x D, where D == B x C; the set E x C, where E = A x B; and 
also the set of all ordered triples ( a , b, c) where a e A, b e B, and c e C. 
These give us three distinct sets, yet here, also, we feel that these sets must 
be closely related. Of course, we can continue this process with more and 
more sets. To see the exact relation between them we shall have to wait 
until the next section, where we discuss one-to-one correspondences. 

Given any index set T we could define the Cartesian product of the sets 
A a as a varies over T\ since we shall not need so general a product, we do 
not bother to define it. 

Finally, we can consider the Cartesian product of a set A with itself, 
A x A. Note that if the set A is a finite set having n elements, then the set 
A x A is also a finite set, but has n 2 elements. The set of elements (a, a ) in 
A x A is called the diagonal of A x A. 

A subset R of A x A is said to define an equivalence relation on A if 

1. (a, a) e R for all a e A. 

2. (a, b) e R implies ( b , a) e R. 

3. (a, b) g R and ( b , c) e R imply that ( a , c) e R. 

Instead of speaking about subsets of A x A we can speak about a binary 
relation (one between two elements of A) on A itself, defining b to be related 
to a if {a, b) e R. The properties 1, 2, 3 of the subset R immediately translate 
into the properties 1, 2, 3 of the definition below. 

DEFINITION The binary relation ~ on A is said to be an equivalence 
relation on A if for all a, b, c in A 

1 . a ~ a. 

2. a ~ b implies b ~ a. 

3. a ~ b and b ~ c imply a ~ c. 

The first of these properties is called rejlexivity, the second, symmetry, and 
the third, transitivity. 

The concept of an equivalence relation is an extremely important one 
and plays a central role in all of mathematics. We illustrate it with a few 
examples. 

Example 1.1.1 Let S be any set and define a ~ b, for a, b e S, if and 
only if a = b. This clearly defines an equivalence relation on S. In fact, an 
equivalence relation is a generalization of equality, measuring equality up 
to some property. 
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Example 1.1 .2 Let S be the set of all integers. Given a, b e S, define 
a ~ b if a — b is an even integer. We verify that this defines an equivalence 
relation of S. 

1. Since 0 = a — a is even, a ~ a. 

2. If a ~ b, that is, if a — b is even, then b — a = — (a — b) is also even, 

whence b ~ a. 

3. If a ~ b and b ~ c, then both a — b and b — c are even, whence 

a _ c = {a — b) + [b — c) is also even, proving that a ~ c. 

Example 1.1.3 Let S be the set of all integers and let n > 1 be a fixed 
integer. Define for a, b e S, a ~ b if a — b is a multiple of n. We leave it 
as an exercise to prove that this defines an equivalence relation on S. 

Example 1.1.4 Let S be the set of all triangles in the plane. Two 
triangles are defined to be equivalent if they are similar (i.e., have corre¬ 
sponding angles equal). This defines an equivalence relation on S. 

Example 1.1.5 Let S be the set of points in the plane. Two points a and 
b are defined to be equivalent if they are equidistant from the origin. A 
simple check verifies that this defines an equivalence relation on S. 

There are many more equivalence relations; we shall encounter a few as 
we proceed in the book. 

DEFINITION If A is a set and if ~ is an equivalence relation on A, then 
the equivalence class of a e A is the set {x e A \ a ~ x}. We write it as cl(a). 

In the examples just discussed, what are the equivalence classes? In 
Example 1.1.1, the equivalence class of a consists merely of a itself. In 
Example 1.1.2 the equivalence class of a consists of all the integers of the 
form a + 2m, where m = 0, +1, +2,. .. ; in this example there are only 
two distinct equivalence classes, namely, cl(0) and cl( 1). In Example 1.1.3, 
the equivalence class of a consists of all integers of the form a + kn where 
k = 0, +1, +2,.. . ; here there are n distinct equivalence classes, namely 
cl(0), cl(1),..., cl (n — 1). In Example 1.1.5, the equivalence class of a 
consists of all the points in the plane which lie on the circle which has its 
center at the origin and passes through a. 

Although we have made quite a few definitions, introduced some concepts, 
and have even established a simple little proposition, one could say in all 
fairness that up to this point we have not proved any result of real substance. 
We are now about to prove the first genuine result in the book. The proof 
of this theorem is not very difficult—actually it is quite easy—but nonetheless 
the result it embodies will be of great use to us. 
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THEOREM 1.1.1 The distinct equivalence classes of an equivalence relation on A 
provide us with a decomposition of A as a union of mutually disjoint subsets. Conversely, 
given a decomposition of A as a union of mutually disjoint, nonempty subsets, we can 
define an equivalence relation on A for which these subsets are the distinct equivalence 
classes. 

Proof. Let the equivalence relation on A be denoted by ~. 

We first note that since for any a £ A, a ~ a, a must be in cl(a), whence 
the union of the cl(<z) s is all of A. We now assert that given two equivalence 
classes they are either equal or disjoint. For, suppose that cl (a) and cl (b) 
are not disjoint; then there is an element * e cl (a) n cl(6). Since * £ cl (a), 
a ~ x ; since * £ cl(6), b ~ x, whence by the symmetry of the relation, 
x ~ b- However, a ~ x and x ~ b by the transitivity of the relation forces 
a ~ b. Suppose, now that y £ cl(6); thus b ~ y. However, from a ~ b 
and b ~ y, we deduce that a ~ y, that is, that y £ cl (a). Therefore, every 
element in cl (b) is in cl (a), which proves that cl (b) <= cl (a). The argument 
is clearly symmetric, whence we conclude that cl (a) c cl (b). The two 
opposite containing relations imply that cl (a) = cl (b). 

We have thus shown that the distinct cl(a)’s are mutually disjoint and 
that their union is A. This proves the first half of the theorem. Now for 
the other half! 

Suppose that A = (J A a where the A a are mutually disjoint, nonempty 
sets (a is in some index set T). How shall we use them to define an equiva¬ 
lence relation? The way is clear; given an element a in A it is in exactly one 
A a . We define for a, b £ A, a ~ b if a and b are in the same A a . We leave 
it as an exercise to prove that this is an equivalence relation on A and that 
the distinct equivalence classes are the A a ’s. 


Problems 

1. (a) If A is a subset of B and B is a subset of C, prove that A is a subset 

of C. 

(b) If B a A, prove that A u B = A, and conversely. 

(c) If B c A, prove that for any set C both B u C c A u C and 
B n C c A n C. 

2. (a) Prove that A n B = B n A and A v B = B v A. 

(b) Prove that (A n B) n C = A n {B n C). 

3. Prove that A u {B n C) = (A u B) n (A u C). 

4. For a subset C of S let C' denote the complement of C in S. For any 
two subsets A, B of S prove the De Morgan rules-. 

(a) (A n B)' = A' u B'. 

(b) (A u B)' = A' n B'. 

5. For a finite set C let o{C) indicate the number of elements in C. If ^4 
and B are finite sets prove o(A u B) = o(A ) + o(B ) - o(A n B). 
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6. If A is a finite set having n elements, prove that A has exactly 2" distinct 
subsets. 

7. A survey shows that 63% of the American people like cheese whereas 
76% like apples. What can you say about the percentage of the 
American people that like both cheese and apples? (The given statistics 
are not meant to be accurate.) 

8. Given two sets A and B their symmetric difference is defined to be 
(A — B) u {B — A) . Prove that the symmetric difference of A and B 
equals (A B) — (A n B). 

9. Let S be a set and let S* be the set whose elements are the various sub¬ 
sets of S. In S* we define an addition and multiplication as follows: If 
A, B e S* (remember, this means that they are subsets of S ): 

(1) A + B = (A - B) u (B - A). 

(2) A-B = A n B. 

Prove the following laws that govern these operations: 

(a) (A + B) + C = A + (B + C ). 

(b) A-(B + C) = A-B + A-C. 

(c) A-A = A. 

(d) A + A = null set. 

(e) If A + B = A + C then B = C. 

(The system just described is an example of a Boolean algebra .) 

10. For the given set and relation below determine which define equivalence 
relations. 

(a) S is the set of all people in the world today, a ~ b if a and b have 
an ancestor in common. 

(b) S is the set of all people in the world today, a ~ b if a lives within 
100 miles of b. 

(c) S is the set of all people in the world today, a ~ b if a and b have 
the same father. 

(d) £ is the set of real numbers, a ~ b if a = +b. 

(e) S is the set of integers, a ~ b if both a > b and b > a. 

(f) S is the set of all straight lines in the plane, a ~ b if a is parallel to b. 

11. (a) Property 2 of an equivalence relation states that if a ~ b then 

b ~ a; property 3 states that if a ~ b and b ~ c then a ~ c. 

What is wrong with the following proof that properties 2 and 3 
imply property 1 ? Let a ~ b; then b ~ a, whence, by property 3 
(using a = c), a ~ a. 

(b) Can you suggest an alternative of property 1 which will insure us 
that properties 2 and 3 do imply property 1 ? 

12. In Example 1.1.3 of an equivalence relation given in the text, prove 
that the relation defined is an equivalence relation and that there are 
exactly n distinct equivalence classes, namely, cl(0), cl(1),..., cl(» — 1). 

13. Complete the proof of the second half of Theorem 1.1.1. 
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1.2 Mappings 

We are about to introduce the concept of a mapping of one set into another. 
Without exaggeration this is probably the single most important and uni¬ 
versal notion that runs through all of mathematics. It is hardly a new thing 
to any of us, for we have been considering mappings from the very earliest 
days of our mathematical training. When we were asked to plot the relation 
y = x 2 we were simply being asked to study the particular mapping which 
takes every real number onto its square. 

Loosely speaking, a mapping from one set, S, into another, T, is a “rule” 
(whatever that may mean) that associates with each element in S a unique 
element t in T. We shall define a mapping somewhat more formally and 
precisely but the purpose of the definition is to allow us to think and speak 
in the above terms. We should think of them as rules or devices or mech¬ 
anisms that transport us from one set to another. 

Let us motivate a little the definition that we will make. The point of 
view we take is to consider the mapping to be defined by its “graph.” We 
illustrate this with the familiar example y = x 2 defined on the real numbers 
S and taking its values also in S. For this set S, S x S, the set of all pairs 
(a, b) can be viewed as the plane, the pair (a, b) corresponding to the point 
whose coordinates are a and b , respectively. In this plane we single out all 
those points whose coordinates are of the form (#, x 2 ) and call this set of 
points the graph of y = x 2 . We even represent this set pictorially as 



To find the “value” of the function or mapping at the point x = a, we look 
at the point in the graph whose first coordinate is a and read off the second 
coordinate as the value of the function at x = a. 

This is, no more or less, the approach we take in the general setting to 
define a mapping from one set into another. 

DEFINITION If S and T are nonempty sets, then a mapping from S to T 
is a subset, M, of S x T such that for every s e S there is a unique t e T such 
that the ordered pair (s, t) is in M. 

This definition serves to make the concept of a mapping precise for us but 
we shall almost never use it in this form. Instead we do prefer to think of a 
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mapping as a rule which associates with any element j in S some element 
t in T, the rule being , associate {or map) s e S with t e T if and only if (s, t) e M. 
We shall say that t is the image of j under the mapping. 

Now for some notation for these things. Let a be a mapping from S to 
T; we often denote this by writing a :S -> T or .S' A T. If t is the image of 
s under a we shall sometimes write this as o:s -> t; more often, we shall 
represent this fact by t = so. Note that we write the mapping a on the 
right. There is no overall consistency in this usage; many people would 
write it as t = o{s). Algebraists often write mappings on the right; other 
mathematicians write them on the left. In fact, we shall not be absolutely 
consistent in this ourselves; when we shall want to emphasize the functional 
nature of a we may very well write t = o(s). 


Examples of Mappings 

In all the examples the sets are assumed to be nonempty. 

Example 1.2.1 Let S be any set; define i:S -> S by s = si for any 
s e S. This mapping i is called the identity mapping of S. 

Example 1.2.2 Let S and T be any sets and let t 0 be an element of T. 
Define t :S -* T by x \s -+ t 0 for every s e S. 

Example 1.2.3 Let S be the set of positive rational numbers and let 
T = J x J where J is the set of integers. Given a rational number s we 
can write it as s = mjn, where m and n have no common factor. Define 
t :S -> Thy sx = (m, n). T . 

Example 1.2.4 Let J be the set of integers and S = {{m,n)ej x J\n^ 0}; 
let T be the set of rational numbers; define t :S —*■ T by ( m , n)x = mjn for 
every {m, n) in S. 

Example 1.2.5 Let J be the set of integers and S = J x J. Define 
x :S -+ J by {m, n)x = m + n. 

Note that in Example 1.2.5 the addition in J itself can be represented in 
terms cf a mapping of J x J into J. Given an arbitrary set S we call a 
mapping of S x S into S a binary operation on S. Given such a mapping 
x \S x S —> S we could use it to define a “product” * in S by declaring 
a* b = c if {a, b)x = c. 

Example 1.2.6 Let S and T be any sets; define x:S x T -> S by 
(c, b)x = a for any {a, b) e S x T. This x is called the projection of S x T 
on S. We could similarly define the projection of S x T on T. 
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Example 1.2.7 Let S be the set consisting of the elements x l} x 2 , x 3 . 
Define t :S -» S by x x x = x 2 , x 2 x = * 3 , x 3 x = x v 

Example 1.2.8 Let S be the set of integers and let T be the set consisting 
of the elements E and 0. Define x:S ^ T by declaring nx = E if n is even 
and nx = 0 if n is odd. 

If S is any set, let {x 1} . . ., x n ] be its subset consisting of the elements 
x i> x 2 >- • •, x n of S. In particular, {x} is the subset of S whose only element 
is x. Given S we can use it to construct a new set S*, the set whose elements 
are the subsets of S. We call S* the set of subsets of S. Thus for instance, if 
S = {x 1} x 2 } then S* has exactly four elements, namely, = null set, 
a 2 = the subset, S, of S, a 3 = {jjq}, a 4 = {x 2 }. The relation of S to S*, 
in general, is a very interesting one; some of its properties are examined in 
the problems. 

Example 1.2.9 Let 5 be a set, T = S*; define x:S -> T by = 
complement of {j} in S = S — {j}. 

Example 1.2.10 Let S be a set with an equivalence relation, and let 
T be the set of equivalence classes in S (note that T is a subset of S*). 
Define -> Thy sx = cl(j). 

We leave the examples to continue the general discussion. Given a 
mapping x.S —> T we define for t e T, the inverse image of t with respect to x 
to be the set g S | t = jt}. In Example 1.2.8, the inverse image of E is 
the subset of S consisting of the even integers. It may happen that for some 
t in T that its inverse image with respect to x is empty; that is, t is not the 
image under x of any element in S. In Example 1.2.3, the element (4, 2) is 
not the image of any element in S under the x used; in Example 1.2.9, S, 
as an element m S*, is not the image under the x used of any element in S. 

DEFINITION The mapping t of S into T is said to be onto T if given 
t e T there exists an element s e S such that t = sx. 

If we call the subset Sx = {x e T | x = sx for some j e S } the image of S 
under x, then x is onto if the image of S under x is all of T. Note that in 
Examples 1.2.1, 1.2.4-1.2.8, and 1.2.10 the mappings used are all onto. 

Another special type of mapping arises often and is important: the one- 
to-one mapping. 


DEFINITION The mapping t of S into T is said to be a one-to-one mapping 
if whenever s t ^ s 2 , thenar ^ s 2 x. 
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In terms of inverse images, the mapping x is one-to-one if for any t e T 
the inverse image of t is either empty or is a set consisting of one element. 
In the examples discussed, the mappings in Examples 1.2.1, 1.2.3, 1.2.7, 
and 1.2.9 are all one-to-one. 

When should we say that two mappings from S to T are equal? A natural 
definition for this is that they should have the same effect on every element 
of S; that is, the image of any element in S under each of these mappings 
should be the same. In a little more formal manner: 

DEFINITION The two mappings a and x of S into T are said to be equal 
if sa = sx for every s e S. 

Consider the following situation: We have a mapping a from S to T and 
another mapping x from T to U. Can we compound these mappings to 
produce a mapping from S to U? The most natural and obvious way of 
doing this is to send a given element s, in S, in two stages into U, first by 
applying a to s and then applying x to the resulting element sa in T. This 
is the basis of the 

DEFINITION If a:S -> T and x:T -* U then the composition of a and x 
(also called their product) is the mapping a ° x:S -*■ U defined by means of 
s(a o t) = (sa)x for every ^ e S. 

Note that the order of events reads from left to right; a ° x reads: first 
perform a and then follow it up with x. Here, too, the left-right business is 
not a uniform one. Mathematicians who write their mappings on the left 
would read a ° x to mean first perform x and then a. Accordinglyin 
reading a given book in mathematics one must make absolutely sure as to 
what convention is being followed in writing the product of two mappings. 
We reiterate, for us a ° x will always mean: first apply a and then x. 

We illustrate the composition of a and x with a few examples. 

Example 1.2.11 Let S = {x t , x 2 , x 3 } and let T = S. Let a:S ^ S be 
defined by 

X^(J —— #2 j 

x 2 a = x 3 , 
x 3 a = x l - 

and x :S S by 


x-yt — * l3 
x 2 x = * 3 , 
x 3 x = x 2 . 
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Thus 

(cr o t) = (x^X = x 2 x — x 3 , 

x 2 (o ° t) = {x 2 a)x = * 3 t = ^ 2 , 

* 3 (<r ° t) = (* 3 <t)t = tfjT = x t . 

At the same time we can compute x ° c, because in this case it also makes 

sense. Now 

*i(t°<t) = = (x^) = x 2 , 

x 2 (x o a) = (x 2 x)a = x 3 o — x 1} 

x 3 (t o a) = (x 3 x)a = x 2 a = x 3 . 

Note that x 2 — whereas x 3 = ° t) whence c ° x ^ x ° a. 

Example 1.2.12 Let 5 be the set of integers, T the set S x S, and suppose 

<r:S -> T is defined by mo = (m — 1, 1). Let U = 5 and suppose that 
-> t/(= £) is defined by (hi, »)t = m + w. Thus <r ° T :S -> S whereas 
x ° a : T -> T\ even to speak about the equality of a ° t and too- would 
make no sense since they do not act on the same space. We now compute 
c ° x as a mapping of S into itself and then x ° a as one on T into itself. 

Given meS, mo — [m ~ 1,1) whence m(o ° x) = {mo)x — (m — 1, 1)t = 
{m — 1) + 1 = m. Thus a ° x is the identity mapping of S into itself. What 
about t ° a? Given (m, n) e T, (m, n)x — m + n, whereby (m, n) {x ° a) = 
((m, n)x)a = (m + n)<r — (m + n — 1, 1). Note that r ° a is not the identity 
map of T into itself; it is not even an onto mapping of T. 

Example 1.2.13 Let S be the set of real numbers, T the set of integers, 
and U = { E , 0}. Define er.S -> T by sa = largest integer less than or 
equal to s, and x:T —> U defined by nx — E if n is even, nx — 0 if n is odd. 
Note that in this case x ° a cannot be defined. We compute a ° x for two 
real numbers s - f and s = n. Now since f = 2 + f, (f)<r = 2, whence 
(t) (c ° t) = (fc)T = (2)t = E; (n)a = 3, whence n{a ° t) = {na)x = 
(3)t =0. 

For mappings of sets, provided the requisite products make sense, a 
general associative law holds. This is the content of 

LEMMA 1.2.1 (Associative Law) If a:S -» T, %:T -» U, and fi:U -» V, 
then (c o t) o /i — c ° (t ° fi). 

Proof. Note first that <r ° x makes sense and takes S into U, thus 
(<r o t) o n also makes sense and takes S into V. Similarly a o (t o f) is 
meaningful and takes S into V. Thus we can speak about the equality, or 
lack of equality, of (<r ° t) ° fi and a ° (t o f). 

To prove the asserted equality we merely must show that for any s e S, 
s((a ° t) ° fi) = s(a o (t o ^)). Now by the very definition of the composition 
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of maps, s((o ° x) ° f) = (i(ff « i))/i = ((so) t)/i whereas s(g ° (t ° fj) = 
(j<t) (t ° n) = ((sa)x)fi. Thus, the elements j((<t ° x) ° f) and j(a o (x o /i)) 
are indeed equal. This proves the lemma. 

W/e should like to show that if two mappings o and t are properly condi¬ 
tioned the very same conditions carry over to <7 o %. 


LEMMA 1.2.2 Let g:S —► T and x:T — > U; then 

1 . a ° x is onto if each of a and x is onto. 

2 . a ° x is one-to-one if each of a and x is one-to-one. 


Proof. We prove only part 2, leaving the proof of part 1 as an exercise. 

Suppose that q, s 2 £ S and that q ^ s 2 . By the one-to-one nature of < 7 , 
q <7 ^ s 2 <t. Since x is one-to-one and qc 7 and s 2 o are distinct elements of T, 
i. s i a )x # (s 2 o)t whence q(<x o T ) = (qa) x # (s 2 a)x = q ((7 ° t), proving 
that O' 0 T is indeed one-to-one, and establishing the lemma. 


Suppose that a is a one-to-one mapping of S onto T; we call a a one-to-one 
correspondence between S and T. Given any t e T, by the £t onto-ness” of a 
there exists an element s g S such that t — sg\ by the tt one-to-oneness ,, of 
or this s is unique. We define the mapping cr“ 1 : T £ by j = to~ 1 if and 
only if t so. The mapping <7 1 is called the inverse of o. Let us compute 
G°a which maps S into itself. Given s e S, let t = so, whence by 
definition s_ = to~ 1 ; thus s(a ° a~ *) = (sa)a~ 1 = to~ 1 = s. We have shown 
that a o a 1 is the identity mapping of S onto itself. A similar computation 
reveals that a 1 ° a is the identity mapping of T onto itself. 

Conversely, if g:S —> T is such that there exists a fi:T ->• S with thp 
property that a ° /t and /t ° a are the identity mappings on S and T, respec¬ 
tively, then we claim that a is a one-to-one correspondence between S and T. 
First observe that a is onto for, given t e T, t = t(n ° a) = (tjx)a (since 
fi°Gis the identity on T) and so t is the image under a of the element t\i in 
* ^ xt observe that a is one-to-one, for ifqc = s 2 g, using that a ° /t is the 
identity on 5, we have q = q(ff ° f) = (q o)n = (s 2 a) fi = s 2 {g o f) = s 2 . 
Vve have now proved 


EMMA 1,2.3 The mapping g:S —► T is a one-to-one correspondence between 
and T if and only if there exists a mapping fi:T S such that g ° ft and ji ° g 
are the identity mappings on S and T, respectively. 

DEFINITION If S is a nonempty set then A{S) is the set of all one-to-one 
Mappings of S onto itself. 

Aside from its own intrinsic interest A(S) plays a central and universal 
jrpe of role in considering the mathematical system known as a group 
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(Chapter 2). For this reason we state the next theorem concerning its 
nature. All the constituent parts of the theorem have already been proved 
in the various lemmas, so we state the theorem without proof. 

THEOREM 1.2.1 If o, x, /j, are elements of A(S), then 

1. a ° t is in A(S). 

2. (a ° t) ° ju = a o (r o ju). 

3. There exists an element i (the identity map) in A(S) such that o°i = i°o = o. 

4. There exists an element o~ 1 e A(S) such that a ° o~ 1 = o~ 1 ° a = i . 

We close the section with a remark about A(S). Suppose that S has more 
than two elements; let x l3 x 2 , x 3 be three distinct elements in S; define the 
mapping o:S -> S by x t o = x 2 , x 2 a = x 3 , x 3 o = x l3 so = s for any 
s g S different from x l3 x 2 , x 3 . Define the mapping x:S -> S by x 2 x = x 3 , 
x 3 x = x 2 , and sx = s for any s g S different from x 2 , x 3 . Clearly both o and 
x are in A(S). A simple computation shows that xfo ° x) — x 3 but that 
*i(t o o) = x 2 # x 3 . Thus o o x # to o. This is 

LEMMA 1.2.4 If S has more that two elements we can find two elements o, 
x in A(S) such that a ° x # x ° a. 


Problems 

1. In the following, where o:S -> T, determine whether the o is onto 

and/or one-to-one and determine the inverse image of any t g T 

under a. 

(a) S = set of real numbers, T = set of nonnegative real numbers, 
so = s 2 . 

(b) S = set of nonnegative real numbers, T = set of nonnegative real 
numbers, so = s 2 . 

(c) S = set of integers, T = set of integers, so = s 2 . 

(d) S = set of integers, T = set of integers, so = 2s. 

2- If S and T are nonempty sets, prove that there exists a one-to-one 

correspondence between S x T and T x S. 

3. If S, T, U are nonempty sets, prove that there exists a one-to-one 
correspondence between 

(a) (S x T) x U and S x (T x U). 

(b) Either set in part (a) and the set of ordered triples (s, t, u) where 
S G S, t G T, U G U. 

4. (a) If there is a one-to-one correspondence between S and T, prove 

that there exists a one-to-one correspondence between T and S. 
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(b) If there is a one-to-one correspondence between S and T and 
between T and U, prove that there is a one-to-one correspondence 
between S and U. 

5. If i is the identity mapping on S, prove that for any aeA(S), 
a o i = i o o = a. 

*6. If S is any set, prove that it is impossible to find a mapping of .S' onto S*. 

7. If the set S has n elements, prove that A (S) has rc! (n factorial) elements. 

8. If the set S has a finite number of elements, prove the following: 

(a) If a maps S onto S, then a is one-to-one. 

(b) If a is a one-to-one mapping of S onto itself, then a is onto. 

(c) Prove, by example, that both part (a) and part (b) are false if S 
does not have a finite number of elements. 

9. Prove that the converse to both parts of Lemma 1.2.2 are false; namely, 

(a) If a o % is onto, it need not be that both a and t are onto. 

(b) If a o t is one-to-one, it need not be that both a and t are one-to- 
one. 

10. Prove that there is a one-to-one correspondence between the set of 
integers and the set of rational numbers. 

11. If a:S -> T and if A is a subset of S, the restriction of a to A, a A , is 
defined by aa A = aa for any a e A. Prove 

(a) a A defines a mapping of A into T. 

(b) a A is one-to-one if a is. 

(c) o A may very well be one-to-one even if a is not. 

12. If a:S S and A is a subset of S such that Aa c A, prove that 
(a o a) A = a A o a A . 

13. A set S is said to be infinite if there is a one-to-one correspondence 
between S and a proper subset of S. Prove 

(a) The set of integers is infinite. 

(b) The set of real numbers is infinite. 

(c) If a set S has a subset A which is infinite, then S must be infinite. 
{Note: By the result of Problem 8, a set finite in the usual sense is not 
infinite.) 

14. If S is infinite and can be brought into one-to-one correspondence 
with the set of integers, prove that there is one-to-one correspondence 
between S and S x S. 

15. Given two sets S and T we declare S < T {S is smaller than T) if 
there is a mapping of T onto S but no mapping of S onto T. Prove that 
if S < T and T < U then S < U. 

16. If S and T are finite sets having m and n elements, respectively, prove 
that if m < n then S < T. 
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1.3 The Integers 

We close this chapter with a brief discussion of the set of integers. We shall 
make no attempt to construct them axiomatically, assuming instead that we 
already have the set of integers and that we know many of the elementary 
facts about them. In this number we include the principle of mathematical 
induction (which will be used freely throughout the book) and the fact that 
a nonempty set of positive integers always contains a smallest element. As 
to notation, the familiar symbols: a > b, a < b, \a\, etc., will occur with 
their usual meaning. To avoid repeating that something is an integer, we 
make the assumption that all symbols, in this section, written as lowercase Latin 
letters will be integers. 

Given a and b, with b 0, we can divide a by b to get a nonnegative 
remainder r which is smaller in size than b; that is, we can find m and r 
such that a = mb + r where 0 < r < |6|. This fact is known as the 
Euclidean algorithm and we assume familiarity with it. 

We say that b 0 divides a if a = mb for some m. We denote that b 
divides a by b \ a, and that b does not divide a by b f a - Note that if a | 1 then 
a — +1, that when both a | b and b \ a, then a = + b, and that any b 
divides 0. If b \ a, we call b a divisor of a. Note that if b is a divisor of g 
and of h, then it is a divisor of mg + nh for arbitrary integers m and n. We 
leave the verification of these remarks as exercises. 

DEFINITION The positive integer c is said to be the greatest common divisor 
of a and b if 

1. c is a divisor of a and of b. 

2. Any divisor of a and b is a divisor of c. 

We shall use the notation (a, b) for the greatest common divisor of a and 
b. Since we insist that the greatest common divisor be positive, {a, b) = 
(a, —b) = ( — a, b ) = ( — a, —b ). For instance, (60, 24) = (60, —24) = 12. 
Another comment: The mere fact that we have defined what is to be meant 
by the greatest common divisor does not guarantee that it exists. This will 
have to be proved. However, we can say that if it exists then it is unique, 
for, if we had c 1 and c 2 satisfying both conditions of the definition above, 
then c 1 | c 2 and c 2 \c l , whence we would have c 1 = +c 2 ; the insistence on 
positivity would then force c 1 — c 2 . Our first business at hand then is to 
dispose of the existence of (a, b). In doing so, in the next lemma, we actually 
prove a little more, namely that [a, b) must have a particular form. 

LEMMA 1.3.1 If a and b are integers, not both 0, then ( a, b) exists; moreover, 
we can find integers m 0 and n 0 such that {a, b ) = m 0 a + n 0 b. 
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Proof- Let Jt be the set of all integers of the form ma + nb, where m 
and n range freely over the set of integers. Since one of a or b is not 0, there 
are nonzero integers in Ji. Because x = ma + nb is in Ji, —x = (—m)a + 

( — n)b is also in Ji\ therefore, Ji always has in it some positive integers. 
But then there is a smallest positive integer, c, in Ji ; being in Ji , c has the 
form c — rn 0 a + n Q b. We claim that c = ( a , b ). 

Note first that if d | a and d j b, the d \ ( m 0 a + n^b), whence d [ c. We now 
must show that c \ a and c | b. Given any element x = ma + nb in Ji , then 
by the Euclidean algorithm, x = tc + r where 0 < r < c. Writing this 
out explicitly, ma + nb — t(m 0 a + n 0 b ) + r, whence r = (m — tm 0 )a + 
( n — tn 0 )b and so must be in Ji. Since 0 < r and r < c, by the choice of 
c, r = 0. Thus x = tc; we have proved that c \ x for any x e Ji. But 
a = la + Ob e Ji and b = 0a + lb e Ji, whence c | a and c \ b. 

We have shown that c satisfies the requisite properties to be (a, b) and 
so we have proved the lemma. 

DEFINITION The integers a and b are relatively prime if ( a , b) = 1. 

As an immediate consequence of Lemma 1.3.1, we have the 

COROLLARY If a and b are relatively prime, we can find integers m and n such 
that ma + nb = 1. 

We introduce another familiar notion, that of prime number. By this 
we shall mean an integer which has no nontrivial factorization. For technical 
reasons, we exclude 1 from the set of prime numbers. The sequence 2, 3, 5, 
7, 11,... are all prime numbers; equally, —2, —3, —5,... are prime 
numbers. Since, in factoring, the negative introduces no essential differences, 
for us prime numbers will always be positive. 

definition The integer p > 1 is a prime number if its only divisors are 
±1 5 ±p- 

Another way of putting this is to say that an integer p (larger than 1) is a 
prune number if and only if given any other integer n then either (p, n) — 1 
or P I n. As we shall soon see, the prime numbers are the building blocks of 
the integers. But first we need the important observation, 

LEMMA 1.3.2 If a is relatively prime to b but a \ be, then a \ c. 

Proof. Since a and b are relatively prime, by the corollary to Lemma 
1.3.1, we can find integers m and n such that ma +' nb = 1. Thus 
m o.c + nbc — c. Now a | mac and, by assumption, a \ nbc; consequently, 
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a | (mac + nbc). Since mac + nbc = c, we conclude that a \ c, which is 
precisely the assertion of the lemma. 

Following immediately from the lemma and the definition of prime 
number is the important 

COROLLARY If a prime number divides the product of certain integers it must 
divide at least one of these integers. 

We leave the proof of the corollary to the reader. 

We have asserted that the prime numbers serve as the building blocks 
for the set of integers. The precise statement of this is the unique factorization 
theorem: 

THEOREM 1.3.1 Any positive integer a > 1 can be factored in a unique way 
as a = PfPi 1 ' ' ' Pt* ^ where p\ > p 2 > '' ' > p t are prime numbers and 
where each a* > 0. 

Proof. The theorem as stated actually consists of two distinct sub¬ 
theorems ; the first asserts the possibility of factoring the given integer as a 
product of prime powers; the second assures us that this decomposition is 
unique. We shall prove the theorem itself by proving each of these sub¬ 
theorems separately. 

An immediate question presents itself: How shall we go about proving 
the theorem? A natural method of attack is to use mathematical induction. 
A short word about this; we shall use the following version of mathematical 
induction: If the proposition P(m 0 ) is true and if the truth of P(r) for all r 
such that m 0 < r < k implies the truth of P(k), then P(n) is true for all 
n > m 0 . This variant of induction can be shown to be a consequence of the 
basic property of the integers which asserts that any nonempty set of positive 
integers has a minimal element (see Problem 10). 

We first prove that every integer a > 1 can be factored as a product of 
prime powers; our approach is via mathematical induction. 

Certainly m 0 = 2, being a prime number, has a representation as a 
product of prime powers. 

Suppose that any integer r, 2 < r < k can be factored as a product of 
prime powers. If k itself is a prime number, then it is a product of prime 
powers. If k is not a prime number, then k = uv, where 1 < u < k and 
1 < v < k. By the induction hypothesis, since both u and v are less than k, 
each of these can be factored as a product of prime powers. Thus k — uv 
is also such a product. We have shown that the truth of the proposition for 
all integers r, 2 < r < k, implies its truth for k. Consequently, by the 
basic induction principle, the proposition is true for all integers n > m 0 = 2; 
that is, every integer n > 2 is a product of prime powers. 
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Now for the uniqueness. Here, too, we shall use mathematical induction, 
and in the form used above. Suppose that 

a = Pi'Pi' ■ ■ •A*' = <h , '<h h ■ ■ ■ ?/’> 

where pi > P 2 > " ' Pr, h > 1l > >?. are prime numbers, and 

where each a f > 0 and each 0 t > 0. Our object is to prove 


1. r = s. 

2 . P l = 01 , p2 = ■ • • > Pr = <lr- 

3. Oq = Ph &2 = 02, • • • 3 Pr' 

For a = 2 this is clearly true. Proceeding by induction we suppose it to 
be true for all integers u, 2 < u < a. Now, since 

a = p • • -A. ar = • • •?/* 

and since oq > 0, | a, hence p l | q/ 1 ■ ■ ■ ?/ s . However, since is a 

prime number, by the corollary to Lemma 1.3.2, it follows easily that 
p l = q { for some i. Thus ^ > q t = p v Similarly, since q t \ a we get 
q 1 = p. for some j, whence p x > pj = q y . In short, we have shown that 

p x = q v Therefore a = p^p^ 2 • ■ 'p? r = Pi Pl q 2 p2 ' ‘ ' We claim that 

this forces oq = 0 X . (Prove!) But then 

i = 4 = A" • ' 'Pr' = Js' 1 ' ' • 

P 

If 6 = 1, then a 2 = • • • = a r = 0 and 0 2 = ’ ' ’ = P s — 0; that is, 
r = s = 1, and we are done. If b > 1, then since b < a we can apply, our 
induction hypothesis to b to get 

1. The number of distinct prime power factors (in b) on both sides is equal, 
that is, r — 1 = s — 1, hence r — s. 

2. <x 2 = P 2 , • • • , oc r = 0 r . 

3 - p 2 = q 2 , ' ■ • > Pr = <lr' 

Together with the information we already have obtained, namely, p x = q x 
and oq = 0 X , this is precisely what we were trying to prove. Thus we see 
that the assumption of the uniqueness of factorization for the integers less 
than a implied the uniqueness of factorization for a. In consequence, the 
induction is completed and the assertion of unique factorization is estab¬ 
lished. 

We change direction a little to study the important notion of congruence 
modulo a given integer. As we shall see later, the relation that we now 
introduce is a special case of a much more general one that can be defined 
in a much broader context. 
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DEFINITION Let n > 0 be a fixed integer. We define a = b mod n if 
n\ (a — b). 

The relation is referred to as congruence modulo n, n is called the modulus of 
the relation, and we read a = b mod n as “a is congruent to b modulo n .” 
Note, for example, that 73 = 4 mod 23, 21 = —9 mod 10, etc. 

This congruence relation enjoys the following basic properties: 

LEMMA 1.3.3 

1. The relation congruence modulo n defines an equivalence relation on the set of 
integers. 

2. This equivalence relation has n distinct equivalence classes. 

3. If a = b mod n and c = d mod n, then a + c = b + d mod n and ac = 
bd mod n. 

4. If ab = ac mod n and a is relatively prime to n, then b = c mod n. 

Proof. We first verify that the relation congruence modulo n is an 
equivalence relation. Since n | 0, we indeed have that n\ {a — a) whence 
a = a mod n for every a. Further, if a = b mod n then n | (a — b), and so 
n | (b — a) = — ( a — b ); thus b = a mod n. Finally, if a = b mod n and 
b = c mod n, then n | {a — b) and n | {b — c) whence n\ {(a — b) + 

(b — c )}, that is, n\ (a — c). This, of course, implies that a = c mod n. 

Let the equivalence class, under this relation, of a be denoted by [a]; 
we call it the congruence class (mod n) of a. Given any integer a, by the 
Euclidean algorithm, a = kn + r where 0 < r < n. But then, a e [r] and 
so [a] = [r]. Thus there are at most n distinct congruence classes; namely, 
[0], [1], . . ., \n — 1]. However, these are distinct, for if [i] = [j] with, 
say, 0 < i < j < n, then n | (j — i ) where j — i is a positive integer less 
than n, which is obviously impossible. Consequently, there are exactly the 
n distinct congruence classes [0], [1], . . ., \n — 1]. We have now proved 
assertions 1 and 2 of the lemma. 

We now prove part 3. Suppose that a = b mod n and c = d mod n\ 
therefore, n | (a — b) and n | (c — d) whence n | {(a — d) + (c — d)}, and 
so n | {(a + c) — (b + d)}. But then a + c = b + d mod n. In addition, 
n | {(a — b)c + (c — d)b} = ac — bd, whence ac = bd mod n. 

Finally, notice that if ab = ac mod n and if a is relatively prime to n, 
then the fact that n | a(b — c), by Lemma 1.3.2, implies that n\{b — c) and 
so b = c mod n. 

If a is not relatively prime to n, the result of part 4 may be false; for 
instance, 2.3 = 4.3 mod 6, yet 2^4 mod 6. 

Lemma 1.3.3 opens certain interesting possibilities for us. Let J n be the 
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set of the congruence classes mod n ; that is, J n = {[0], [1],. . ., \n — 1]}. 
Given two elements, [i] and [ j ] in J n , let us define 

[*] + [j] = U + j] ; (a) 

WW = Itil (b) 

We assert that the lemma assures us that this “addition” and “multipli¬ 
cation” are well defined. ; that is, if [i] = [*'] and [j] = [j'], then [i] + [j] = 

[i +n = ra + n = rai + m and f ha t ram = ram (verify!) 

These operations in J n have the following interesting properties (whose 
proofs we leave as exercises): for any [i], [j], [£] in J n , 


commutative laws. 


associative laws. 


1. ra + [J] = [j] + ra 

2. raw = wra 

3. (ra + w) + w = ra + (w + w)\ 

4. (raw)w = ra(ww) / 

3- K(W + W) = [i][j] + [i]M distributive law 

6. [0] + [*] = [t]. 

7. [i m = h. 


One more remark: if n = p is a prime number and if [a] ^ [0] is in J p , 
then there is an element [6] in J p such that [«][£] = [1]. 

The set Jn pl a ys an important role in algebra and number theory. It is 
called the set of integers mod n; before we proceed much further we will have 
become well acquainted with it. 


Problems 

1. If a | b and b | a, show that a = +b. 

2. If b is a divisor of g and of h, show it is a divisor of mg + nh. 

3. If a and b are integers, the least common multiple of a and b, written as 
[a, b], is defined as that positive integer d such that 

(a) a | d and b | d. 

(b) Whenever a \ x and b \ x then d \ x. 

Prove that [a, b\ exists and that \a, b\ = abj{a, b), if a > 0, b > 0. 

4. If a\ x and b | x and ( a , b) = 1 prove that (ab) | x. 

5. If a = pfi 1 • • • p£ k and b = pfi 1 • • • pfi k where the p t are distinct 
prime numbers and where each a { > 0, /?,• > 0, prove 

(a) (a, b) — p± 1 • • • pfi k where d t = minimum of a ; and /?,• for each i. 

(b) \a, b] = pfi 1 ■ • • pj k where y- t = maximum of a,- and /?,• for each i. 
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6. Given a, b, on applying the Euclidean algorithm successively we have 

a = q 0 b + r l3 0 < < \b\, 

b = q+ r 2 , 0 < r 2 < r l3 

r i = <h r 2 + r 3 , 0 < r 3 < r 2 , 

r k ~ Qk + l r k + l + r k + 2j 0 < T k + 2 < T k + 

Since the integers r fc are decreasing and are all nonnegative, there is a 
first integer n such that r n + 1 =0. Prove that r n = (a, b). (We 
consider, here, r 0 = |6|.) 

7. Use the method in Problem 6 to calculate 

(a) (1128, 33). (b) (6540, 1206). 

8. To check that n is a prime number, prove that it is sufficient to show 
that it is not divisible by any prime number p , such that p < \Jn. 

9. Show that n > 1 is a prime number if and only if for any a either 
(a, n) = 1 or n | a. 

10. Assuming that any nonempty set of positive integers has a minimal 
element, prove 

(a) If the proposition P is such that 

(1) P(m 0 ) is true, 

(2) the truth of P(m — 1) implies the truth ofP(m), 
then P(n) is true for all n > m 0 . 

(b) If the proposition P is such that 

(1) P{m 0 ) is true, 

(2) P(m) is true whenever P(a) is true for all a such that 
m 0 < a < m, 

then P(n) is true for all n > m 0 . 

11. Prove that the addition and multiplication used in J n are well defined. 

12. Prove the properties 1-7 for the addition and multiplication in J n . 

13. If ( a , n) = 1, prove that one can find [6] e J n such that = [1] 

in Jn- 

*14. If p is a prime number, prove that for any integer a, aP = a mod p. 

15. If (m, n) = 1, given a and b, prove that there exists an x such that 
x = a mod m and x = b mod n. 

16. Prove the corollary to Lemma 1.3.2. 

> 

17. Prove that n is a prime number if and only if in J n , \a\ [b] = [0] 
implies that [ a] = \b] = [0]. 





Sec. 1.3 The Integers 


Supplementary Reading 

For sets and cardinal numbers: 

Birkhoff, G., and MacLane, S., A Brief Survey of Modern Algebra, 2nd ed. New York: 
The Macmillan Company, 1965. 



Group Theory 


In this chapter we shall embark on the study of the algebraic object 
known as a group which serves as one of the fundamental building 
blocks for the subject today called abstract algebra. In later chapters 
we shall have a look at some of the others such as rings, fields, vector 
spaces, and linear algebras. Aside from the fact that it has become 
traditional to consider groups at the outset, there are natural, cogent 
reasons for this choice. To begin with, groups, being one-operational 
systems, lend themselves to the simplest formal description. Yet 
despite this simplicity of description the fundamental algebraic con¬ 
cepts such as homomorphism, quotient construction, and the like, 
which play such an important role in all algebraic structures—in fact, 
in all of mathematics—already enter here in a pure and revealing form. 

At this point, before we become weighted down with details, let us 
take a quick look ahead. In abstract algebra we have certain basic 
systems which, in the history and development of mathematics, have 
achieved positions of paramount importance. These are usually sets 
on whose elements we can operate algebraically—by this we mean that 
we can combine two elements of the set, perhaps in several ways, to 
obtain a third element of the set—and, in addition, we assume that 
these algebraic operations are subject to certain rules, which are 
explicitly spelled out in what we call the axioms or postulates defining 
the system. In this abstract setting we then attempt to prove theorems 
about these very general structures, always hoping that when these 
results are applied to a particular, concrete realization of the abstract 
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s y Stem there will flow out facts and insights into the example at hand which 
would have been obscured from us by the mass of inessential information 
available to us in the particular, special case. 

We should like to stress that these algebraic systems and the axioms 
which define them must have a certain naturality about them. They must 
come from the experience of looking at many examples; they should be rich 
in meaningful results. One does not just sit down, list a few axioms, and 
then proceed to study the system so described. This, admittedly, is done 
by some, but most mathematicians would dismiss these attempts as poor 
mathematics. The systems chosen for study are chosen because particular 
cases of these structures have appeared time and time again, because some¬ 
one finally noted that these special cases were indeed special instances of 
a general phenomenon, because one notices analogies between two highly 
disparate mathematical objects and so is led to a search for the root of 
these analogies. To cite an example, case after case after case of the special 
object, which we know today as groups, was studied toward the end of 
the eighteenth, and at the beginning of the nineteenth, century, yet it was 
not until relatively late in the nineteenth century that the notion of an 
abstract group was introduced. The only algebraic structures, so far en¬ 
countered, that have stood the test of time and have survived to become 
of importance, have been those based on a broad and tall pillar of special 
cases. Amongst mathematicians neither the beauty nor the significance of 
the first example which we have chosen to discuss—groups—is disputed. 

2.1 Definition of a Group 

At this juncture it is advisable to recall a situation discussed in the first 
chapter. For an arbitrary nonempty set S we defined A(S) to be the set of 
all one-to-one mappings of the set S onto itself. For any two elements <r, 
T G A(S) we introduced a product, denoted by <7 ° t, and on further investi¬ 
gation it turned out that the following facts were true for the elements of 
A (S') subject to this product: 

1. Whenever <7, t e A (S), then it follows that <7 ° T is also in A(S). This is 
described by saying that A(S) is closed under the product (or, sometimes, 
as closed under multiplication). 

2. For any three elements a, t, jU e A(S), a ° (t ° ji) = (o ° t) ° jU. This 
relation is called the associative law. 

3. There is a very special element i e .4 (.S) which satisfies i°o = o°i = o 
for all a e ^4(£). Such an element is called an identity element for ^4(5). 

4. For every o e A(S) there is an element, written as <7 _1 , also in A(S), 
such that ( 7 °< 7 -1 = < 7 -1 °(7 = i. This is usually described by saying 
that every element in A(S) has an inverse in A(S). 
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One other fact about A(S) stands out, namely, that whenever S has 
three or more elements we can find two elements a, P e A(S) such that 
& ° P ^ P ° a. This possibility, which runs counter to our usual experience 
and intuition in mathematics so far, introduces a richness into A(S) which 
would have not been present except for it. 

With this example as a model, and with a great deal of hindsight, we 
abstract and make the 

DEFINITION A nonempty set of elements G is said to form a group if in 
G there is defined a binary operation, called the product and denoted by •, 
such that 

1. a, b e G implies that a-b e G (closed). 

2. a, b,c e G implies that a-(b-c) = ( a-b)-c (associative law). 

3. There exists an element e e G such that a-e = e-a = a for all a e G 
(the existence of an identity element in G ). 

4. For every a e G there exists an element a~ 1 e G such that a-a~ x = 
a~ 1 -a = e (the existence of inverses in G). 

Considering the source of this definition it is not surprising that for every 
nonempty set S the set A(S) is a group. Thus we already have presented to 
us an infinite source of interesting, concrete groups. We shall see later (in a 
theorem due to Cayley) that these A(S)’s constitute, in some sense, a 
universal family of groups. If S has three or more elements, recall that we 
can find elements a, x e A(S) such that a ° T # T ° a. This prompts us to 
single out a highly special, but very important, class of groups as in the 
next definition. 

DEFINITION A group G is said to be abelian (or commutative ) if for every 
a, b e G, a- b = b • a. 

A group which is not abelian is called, naturally enough, non-abelian ; 
having seen a family of examples of such groups we know that non-abelian 
groups do indeed exist. 

Another natural characteristic of a group G is the number of elements it 
contains. We call this the order of G and denote it by o(G-). This number is, 
of course, most interesting when it is finite. In that case we say that G is a 
finite group. 

To see that finite groups which are not trivial do exist just note that if the 
set S contains n elements, then the group A(S) has n\ elements. (Prove!) 
This highly important example will be denoted by S n whenever it appears 
in this book, and will be called the symmetric group of degree n' In the next 
section we shall more or less dissect S 3 , which is a non-abelian group of 
order 6. 
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2.2 Some Examples of Groups 

Example 2.2.1 Let G consist of the integers 0, +1, +2,.. . where we 
mean by crb for a, b e G the usual sum of integers, that is , a-b — a + b. 
Then the reader can quickly verify that G is an infinite abelian group in 
which 0 plays the role of e and —a that of a~ 1 . 

Example 2.2.2 Let G consist of the real numbers 1,-1 under the 
multiplication of real numbers. G is then an abelian group of order 2. 

Example 2.2.3 Let G = S 2 , the group of all 1-1 mappings of the set 
{x\, x 2 i * 3 } onto itself, under the product which we defined in Chapter 1. 
Q is a group of order 6. We digress a little before returning to S 2 . 


For a neater notation, not just in S 3 , but in any group G, let us define for 
any a e G, a° = e, a 1 = a, a 2 = a-a, a 3 = a-a 2 , ... , a k = a-a k ~ x , and 
a~ 2 — (a -1 ) 2 , a~ 3 = (a -1 ) 3 , etc. The reader may verify that the usual 
rules of exponents prevail; namely, for any two integers (positive, negative, 
or zero) m, n, 

a m. a n = a m + n ? (!) 

(«")" = a mn . (2) 

(It is worthwhile noting that, in this notation, if G is the group of Example 
2 .2.1, a n means the integer na). 

With this notation at our disposal let us examine S 3 more closely. Con¬ 
sider the mapping (f> defined on the set x ly x 2 , * 3 by 


< P' 

and the mapping 


*3 —> X 2 
#2 —* 

*3 -* X 3 , 

*3 —> X 2 
X2 —* 

X^ —► X^ . 


Checking, we readily see that (j> 2 = e, 1 j/ 3 


e, and that 


whereas 


4>‘\ A : 


Xi — > X 2 
#2 —^ *^2 
*3 x u 
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It is dear that p-p t \rp for they do not take x l into the same image. 

Since t j/ 2, = e, it follows that \f/~ 1 = t j/ 2 . Let us now compute the action 
of 1 • 0 on x l , x 2 , x 3 . Since i j/~ 1 = if/ 2 and 

Xi —> X3 

#2 —* 

% * 2 > 

-> #3 

^2 —* ^2 
X^ —> Xi . 

In other words, 0-t// = tConsider the elements *, 0, \j/, \j/ 2 , fi-p, 
ithese are all distinct and are in G (since G is closed), which only has 
six elements. Thus this list enumerates all the elements of G. One might ask, 
for instance, What is the entry in the list for ij/ • (0 • \J/) ? Using (p-\J/ = p ~ 1 • 0, 
we see that I/) = p • (p ~ 1 • 0) = (t j/-ij/~ 1 ) • (j) = e-<$ = 0. Of more 

interest is the form of (0-t//)-(t/r-$) = (j) • (t/r • • 0)) = 0-(t// 2 -0) = 

1 • 0) = 0•(</>•( J/) = (j) 2 -^ = ^•i// = t/c (The reader should not be 
frightened by the long, wearisome chain of equalities here. It is the last 
time we shall be so boringly conscientious.) Using the same techniques as 
we have used, the reader can compute to his heart’s content others of the 
25 products which do not involve <?. Some of these will appear in the 
exercises. 

Example 2.2.4 Let n be any integer. We construct a group of order n 
as follows: G will consist of all symbols a',i = 0, 1, 2, . . ., n — 1 where 
we insist that a 0 = a" = e, a l -a J = a l+j if i + j < n and a l -a j = a i+j ~ n 
if i + j > n. The reader may verify that this is a group. It is called a 
cyclic group of order n. 

A geometric realization of the group in Example 2.2.4 may be achieved 
as follows: Let S be the circle, in the plane, of radius 1, and let p n be a 
rotation through an angle of 2njn. Then p n e A(S) and p n in A(S) generates 
a group of order n, namely, {e, p n , p n 2 , . . ., p n n ~ 1 }. 

Example 2.2.5 Let S be the set of integers and, as usual, let A(S) be 
the set of all one-to-one mappings of S onto itself. Let G be the set of all 
elements in A(S) which move only a finite number of elements of S ; that is, 
(T £ G if and only if the number of x in S such that xa ^ x is finite. If 
a, t G G, let (7 -t be the product of a and t as elements of A(S). We claim 
that G is a group relative to this operation. We verify this now. 

To begin with, if a, x e G, then a and t each moves only a^finite number 
of elements of S. In consequence, a • x can possibly move only those elements 
in S which are moved by at least one of a or t. Hence a-x moves only a 


2: 

we have that 
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finite number of elements in S; this puts cr-x in G. The identity element, i, 
of A (S') moves no element of S; thus i certainly must be in G. Since the 
associative law holds universally in A(S), it holds for elements of G. Finally, 
if a € G and xo~ 1 ^ x for some x e S, then (xa~ 1 )g # xg, which is to say, 
x(g~ 1 ’^) ^ xa • This works out to say merely that x ^ xg. In other 
words, o~ 1 moves only those elements of S which are moved by g. Because 
a only moves a finite number of elements of S, this is also true for a -1 . 
Therefore a “ 1 must be in G. 

We have verified that G satisfies the requisite four axioms which define a 
group, relative to the operation we specified. Thus G is a group. The reader 
should verify that G is an infinite, non-abelian group. 


# Example 2.2.6 Let G be the set of all 2 x 2 matrices 


where 


a, b, c, d are real numbers, such that ad — be 0. For the operation in G 
we use the multiplication of matrices; that is, 


a b 
e d 


w x 
y z 


aw + by ax + bz 
cw + dy cx + dz 


The entries of this 2x2 matrix are clearly real. To see that this matrix is 
in G we merely must show that 

(aw + by)(cx + dz) — (ax + bz)(cw y- dy) ^ 0 

(this is the required relation on the entries of a matrix which puts it in G). 
A short computation reveals that 

(aw + by)(cx + dz) — (ax + bz)(cw + dy) = (ad — bc)(wz — xy) ^ 0 
since both » 


a b 
c d 


and 


w x 
y z 


are in G. The associative law of multiplication holds in matrices; therefore 
it holds in G. The element 

'1 0 N 


I = 


0 1 


is in G, since 1-1 — 0 • 0 = 1 9 ^ 0; moreover, as the reader knows, or 
can verify, I acts as an identity element relative to the operation of G. 
r a b' 


Finally, if 


€ G then, since ad — be ^ 0, the matrix 

/ d -b 

ad — be ad — be 


ad — be ad — be 
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makes sense. Moreover, 


ad — be 


1 


ad — bcj\ad — beJ \ad — be J\ad — beJ (ad — be) 2 ad — be 

hence the matrix 

d -b 


^ 0 , 


ad — be ad — be 
— c a 


ad — be ad — be 
is in G. An easy computation shows that 
d -b 

a A / ad — be ad — 

c d)\ „ I V 0 1 



d 

-b ' 

ad 

— be 

ad — be 


— c 

a 

ad 

— be 

ad — be 


a b 
c d 


\ad — be ad — be 
thus this element of G acts as the inverse of j M. In short, G is a group. 

V d ) 

It is easy to see that G is an infinite, non-abelian group. 


# Example 2.2.7 Let G be the set of all 2 x 2 matrices 


where 


a , b, c, d are real numbers such that ad — be = 1 . Define the operation • in 
G, as we did in Example 2.2.6, via the multiplication of matrices. We 
leave it to the reader to verify that G is a group. It is, in fact, an infinite, 
non-abelian group. 


One should make a comment about the relationship of the group in 
Example 2.2.7 to that in Example 2.2.6. Clearly, the group of Example 2.2.7 
is a subset of that in Example 2.2.6. However, more is true. Relative to the 
same operation, as an entity in its own right, it forms a group. One could 
describe the situation by declaring it to be a subgroup of the group of Example 
2.2.6. We shall see much more about the concept of subgroup in a few 
pages. 

# Example 2.2.8 Let G be the set of all 2 x 2 matrices ( a ), 

\-b a) 

where a and b are real numbers, not both 0. (We can state this more 
succinctly by saying that a 2 + b 2 0.) Using the same operation as in 
the preceding two examples, we can easily show that G becomes a group. 
In fact, G is an infinite, abelian group. 
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Does the multiplication in 
as al + bj where J = 
Perhaps that will ring a bell 



G remind you of anything? Write 
1 
0 

with you. 


a b 
■ b a 


and compute the product in these terms. 


# Example 2.2.9 Let G be the set of all 2 x 2 matrices 


where 


a, b, c, d are integers modulo p, p a prime number, such that ad — be # 0. 
Define the multiplication in G as we did in Example 2.2.6, understanding 
the multiplication and addition of the entries to be those modulo p. We 
leave it to the reader to verify that G is a non-abelian finite group. 


In fact, how many elements does G have? Perhaps it might be instructive 
for the reader to try the early cases p = 2 and p = 3. Here one can write 
down all the elements of G explicitly. (A word of warning! For p = 3, 
G already has 48 elements.) To get the case of a general prime, p will require 
an idea rather than a direct hacking-out of the answer. Try it! 


2.3 Some Preliminary Lemmas 

We have now been exposed to the theory of groups for several pages and as 
yet not a single, solitary fact has been proved about groups. It is high time 
to remedy this situation. Although the first few results we demonstrate are, 
admittedly, not very exciting (in fact, they are rather dull) they will be 
extremely useful. Learning the alphabet was probably not the most interesting 
part of our childhood education, yet, once this hurdle was cleared, fascinating 
vistas were opened before us. 

We begin with 

LEMMA 2.3.1 If G is a group, then 

a. The identity element of G is unique. 

b. Every a e G has a unique inverse in G. 

c. For every a e G, ( a~ *) _ 1 = a. 

d. For all a, b e G, (a • b )~ 1 = b~ 1 • a~ x . 

Proor. Before we proceed with the proof itself it might be advisable to 
see what it is that we are going to prove. In part (a) we want to show that if 
two elements e and f in G enjoy the property that for every a e G, a = 
a.' e = e- a = a- f = f- a, then e — f. In part (b) our aim is to show that 
x ’ a = a • x = e and y • a = a • y = e, where all of a, x,y are in G, then 


x 


= y. 
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First let us consider part (a). Since e • a = a for every a e G, then, in 
particular, e • f — f. But, on the other hand, since b • f = b for every 
b e G, we must have that e • f = e. Piecing these two bits of information 
together we obtain f = e • f = e, and so « = f. 

Rather than proving part (b), we shall prove something stronger which 
immediately will imply part (b) as a consequence. Suppose that for a in G, 
a • x = e and a • y = e; then, obviously, a • x — a- y. Let us make this our 
starting point, that is, assume that a • x = a • y for a , x,y in G. There is an 
element b e G such that b • a = e (as far as we know yet there may be 
several such b’s). Thus b • [a • x) = b • (a-y); using the associative law this 
leads to 

x = e • x — (b • a) • x — b • (a • x) = b’ {cry) = {b • a) • y = e • y = y. 

We have, in fact, proved that a • x = a • y in a group G forces x — y. 
Similarly we can prove that x • a = y • a implies that x = y. This says that 
we can cancel, from the same side, in equations in groups. A note of caution, 
however, for we cannot conclude that a • x = y • a implies x = y for we have 
no way of knowing whether a • x = x • a. This is illustrated in S 3 with a = (f), 

x = i/b y = <A _1 - 

Part (c) follows from this by noting that a -1 * {a~ 1 )~ 1 = e — a~ l - a; 
canceling off the a~ 1 on the left leaves us with ( a ~ l )~ 1 = a. This is the 
analog in general groups of the familiar result —(—5) = 5, say, in the 
group of real numbers under addition. 

Part (d) is the most trivial of these, for 

{a • b) • (b~ 1 • a~ = a • {{b • b~ • a~ = a • (e • a~ = a • a~ 1 = e, 
and so by the very definition of the inverse, (a • b)~ 1 = b~ x • a~ x . 

Certain results obtained in the proof just given are important enough to 
single out and we do so now in 


LEMMA 2.3.2 Given a, b in the group G, then the equations a-x = b and 
y • a = b have unique solutions for x and y in G. In particular , the two cancellation 
laws, 


and 


a • u = a • w implies u = w 


hold in G. 


u • a = w • a implies u = w 


The few details needed for the proof of this lemma are left to the reader. 
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problems 

1. In the following determine whether the systems described are groups. 
If they are not, point out which of the group axioms fail to hold. 

(a) G = set of all integers, a • b = a — b. 

(b) G — set of all positive integers, a - b = ab, the usual product of 
integers. 

(c) G = a 0 , a ± , . . ., a 6 where 

a i' a j = a i+j if i+j< 7, 

a i‘ a j = a i + j- 7 if * +j > 7 

(for instance, a 5 • a x = a 5 + 4 _ 7 = a 2 since 5 + 4 = 9 > 7). 

(d) G = set of all rational numbers with odd denominators, a ■ b = 
a + b, the usual addition of rational numbers. 

2. Prove that if G is an abelian group, then for all a, b e G and all integers 
n, (a • b) n = a n ■ b n . 

3. If G is a group such that (a - b) 2 — a 2 • b 2 for all a, b e G, show that 
G must be abelian. 

*4. If G is a group in which (a - b) 1 = a 1 • b 1 for three consecutive integers 
i for all a, b e G, show that G is abelian. 

5. Show that the conclusion of Problem 4 does not follow if we assume 
the relation (a - b) 1 = a 1 • b l for just two consecutive integers. 

6. In S 3 give an example of two elements x,y such that (x *j) 2 ^ x 2 • y 2 . 

7. In S 3 show that there are four elements satisfying x 2 = e and three 
elements satisfying y 3 = e. 

8. If G is a finite group, show that there exists a positive integer N such 
that a N = e for all a e G. 

9. (a) If the group G has three elements, show it must be abelian. 

(b) Do part (a) if G has four elements. 

(c) Do part (a) if G has five elements. 

10. Show that if every element of the group G is its own inverse, then G 
is abelian. 

11. If G is a group of even order, prove it has an element a ^ e satisfying 
a 2 = e. 

12. Let G be a nonempty set closed under an associative product, which 
in addition satisfies: 

(a) There exists an e e G such that a - e = a for all a e G. 

(b) Give a e G, there exists an element y(a) e G such that ay(a) = e. 
Prove that G must be a group under this product. 
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13. Prove, by an example, that the conclusion of Problem 12 is false if 
we assume instead: 

(a') There exists aneeG such that a - e = a for all a e G. 

(b') Given a e G, there exists y (a) e G such that y (a) • a = e. 

14. Suppose a finite set G is closed under an associative product and that 
both cancellation laws hold in G. Prove that G must be a group. 

15. (a) Using the result of Problem 14, prove that the nonzero integers 

modulo p, p a prime number, form a group under multiplication 
mod p. 

(b) Do part (a) for the nonzero integers relatively prime to n under 
multiplication mod n. 

16. In Problem 14 show by an example that if one just assumed one of 
the cancellation laws, then the conclusion need not follow. 

17. Prove that in Problem 14 infinite examples exist, satisfying the 
conditions, which are not groups. 


18. For any n > 2 construct a non-abelian group of order 2 n. (Hint: 
imitate the relations in S 3 .) 

19. If .S' is a set closed under an associative operation, prove that no 
matter how you bracket a x a 2 • • • a„, retaining the order of the 
elements, you get the same element in S (e.g., (a x • a 2 ) ’ (<z 3 ■ afi) — 
a x ‘ i a 2 ' ( a 3 ‘ 4 )) j use induction on n). 

fa b y 

#20. Let G be the set of all real 2 x 2 matrices 


c d 


, where ad — be # 0 


is a rational number. Prove that G forms a group under matrix 
multiplication. 

f L\ 

a b 


#21. Let G be the set of all real 2x2 matrices 


0 d 


where ad # 0. 


Prove that G forms a group under matrix multiplication. Is G 
abelian? 

#22. Let G be the set of all real 2x2 matrices ( , ] where a # 0. 

\0 a J 

Prove that G is an abelian group under matrix multiplication. 

#23. Construct in the G of Problem 21 a subgroup of order 4. 

V b 


#24. Let G be the set of all 2 x 2 matrices 


c d 


where a, b, c, d are 


integers modulo 2, such that ad — be # 0. Using matrix multi¬ 
plication as the operation in G, prove that G is a group of order 6. 

#25. (a) Let G be the group of all 2 x 2 matrices ( J where 

A c d ) 

ad — be # 0 and a, b, c, d are integers modulo 3, relative to 
matrix multiplication. Show that o(G) = 48. 
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(b) If we modify the example of G in part (a) by insisting that 
ad — be = 1, then what is o(G)? 


#*26. (a) Let G be the group of all 2 x 2 matrices 


where a, b, c, d 


are integers modulo p, p a prime number, such that ad — be ^ 0. 
G forms a group relative to matrix multiplication. What is o(G) ? 
(b) Let H be the subgroup of the G of part (a) defined by 


H = 

What is o(H)? 


e G | ad — be = 1 


2.4 Subgroups 

Before turning to the study of groups we should like to change our notation 
slightly. It is cumbersome to keep using the • for the group operation; 
henceforth we shall drop it and instead of writing a • b for a, b e G we shall 
simply denote this product as ab. 

In general we shall not be interested in arbitrary subsets of a group G for 
they do not reflect the fact that G has an algebraic structure imposed on it. 
Whatever subsets we do consider will be those endowed with algebraic 
properties derived from those of G. The most natural such subsets are 
introduced in the 

DEFINITION A nonempty subset H of a group G is said to be a subgroup 
of G if, under the product in G, H itself forms a group. 

The following remark is clear: if H is a subgroup of G and K is a subgroup 
of H, then K is a subgroup of G. 

It would be useful to have some criterion for deciding whether a given 
subset of a group is a subgroup. This is the purpose of the next two lemmas. 

LEMMA 2.4.1 A nonempty subset H of the group G is a subgroup of G if and 
only if 

L a, b e H implies that ab e H. 

2. a e H implies that a~ 1 e H. 

Proof. If H is a subgroup of G, then it is obvious that (1) and (2) must 

hold. 

Suppose conversely that H is a subset of G for which (1) and (2) hold. 
In order to establish that H is a subgroup, all that is needed is to verify that 
e e H and that the associative law holds for elements of H. Since the as¬ 
sociative law does hold for G, it holds all the more so for H, which is a 
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subset of G. If a e H y by part 2 , a 1 e H and so by part 1, e = aa 1 e H. 
This completes the proof. 

In the special case of a finite group the situation becomes even nicer for 
there we can dispense with part 2. 

LEMMA 2.4.2 If H is a nonempty finite subset of a group G and H is closed 
under multiplication , then H is a subgroup of G. 

Proof. In light of Lemma 2.4.1 we need but show that whenever a e H, 
then a~ x e H. Suppose that a g H; thus a 2 = aa e H, a 3 = a 2 a g H , 

..., a m g H, . .. since H is closed. Thus the infinite collection of elements 
a,a 2 , . . ., a m , . . . must all fit into H y which is a finite subset of G. Thus 
there must be repetitions in this collection of elements; that is, for some 
integers r, s with r > s > 0 , a r = a s . By the cancellation in G, a r ~ s = e 
(whence e is in H ) ; since r — s — 1 > 0 , a r s 1 G H and a 1 = a r 1 
since aa r ~ s ~ x = a r ~ s = e. Thus a~ x gH, completing the proof of the 
lemma. 

The lemma tells us that to check whether a subset of a finite group is a 
subgroup we just see whether or not it is closed under multiplication. 

We should, perhaps, now see some groups and some of their subgroups. 
G is always a subgroup of itself; likewise the set consisting of e is a subgroup 
of G. Neither is particularly interesting in the role of a subgroup, so we 
describe them as trivial subgroups. The subgroups between these two 
extremes we call nontrivial subgroups and it is in these we shall exhibit 
the most interest. 

Example 2.4.1 Let G be the group of integers under addition, H the 
subset consisting of all the multiples of 5. The student should check that 
H is a subgroup. 

In this example there is nothing extraordinary about 5; we could similarly 
define the subgroup H„ as the subset of G consisting of all the multiples of n. 
H n is then a subgroup for every n. What can one say about H n n H m ? 
It might be wise to try it for H 6 n H 9 . 

Example 2.4.2 Let S be any set, ^(iS 1 ) the set of one-to-one mappings 
of S onto itself, made into a group under the composition of mappings. If 
x 0 g S , let H(x q) = (0 e A(S) | x 0 (f) = * 0 )- H ( x o) is a subgroup of A(S). 
If for x x ^ x 0 g S we similarly define H(x x ), what is H{x 0 ) n H(x x )? 

Example 2.4.3 Let G be any group, a g G. Let ( a ) = {a 1 , | i = 0, +1, 
+ 2, . . .}. (a) is a subgroup of G (verify!); it is called the cyclic subgroup 
generated by a. This provides us with a ready means of producing subgroups 
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0 f G. If for some choice of a, G = (a), then G is said to be a cyclic group. 
Such groups are very special but they play a very important role in the 
theory of groups, especially in that part which deals with abelian groups. 
Of course, cyclic groups are abelian, but the converse is false. 

Example 2.4.4 Let G be a group, W a subset of G. Let (W) be the set 
of all elements of G representable as a product of elements of W raised to 
positive, zero, or negative integer exponents. (W) is the subgroup of G 
generated by W and is the smallest subgroup of G containing W. In fact, (W) 
is the intersection of all the subgroups of G which contain W (this intersec¬ 
tion is not vacuous since G is a subgroup of G which contains W). 

Example 2.4.5 Let G be the group of nonzero real numbers under 
multiplication, and let H be the subset of positive rational numbers. Then 
H is a subgroup of G. 


Example 2.4.6 Let G be the group of all real numbers under addition, 
and let H be the set of all integers. Then H is a subgroup of G. 


# Example 2.4.7 Let G be the group of all real 2x2 matrices 
with ad — be # 0 under matrix multiplication. Let 

'a b' 


a b 
c d 


H = 


0 d 


e G I ad # 0 V. 


Then, as is easily verified, H is a subgroup of G. 

# Example 2.4.8 Let H be the group of Example 2.4.7, and let 
Then if is a subgroup of H. 


Example 2.4.9 Let G be the group of all nonzero complex numbers 
a + bi ( a , b real, not both 0) under multiplication, and let 

H = {a + bie G\ a 2 + b 2 = 1}. 

Verify L hat H is a subgroup of G. 

definition Let G be a group, H a subgroup of G; for a, b e G we say 
a w congruent to b mod H, written as a = b mod H if ab ~ 1 e H. 


lemma 2 . 4.3 The relation a = b mod H is an equivalence relation. 
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Proof. If we look back in Chapter 1, we see that to prove Lemma 2.4.3 
we must verify the following three conditions: For all a,b,ce G, 

1. a = a mod H. 

2. a = b mod H implies b = a mod H. 

3. a = b mod H, b = c mod H implies a = c mod H. 

Let’s go through each of these in turn. 

1. To show that a = a mod H we must prove, using the very definition 
of congruence mod H, that aa 1 £ H. Since H is a subgroup of G, e £ H, 
and since aa - 1 = e, aa~ 1 e H, which is what we were required to demon¬ 
strate. 

2. Suppose that a = b mod H, that is, suppose ab~ 1 e H; we want to 
get from this b = a mod H, or, equivalently, ba~ 1 e H. Since ab 1 £ H, 
which is a subgroup of G, (oft -1 ) -1 e H; but, by Lemma 2.3.1, (ab~ x )' x = 
(b~ x )~ x a~ x = ba~ 1 , and so ba~ 1 e H and b = a mod H. 

3. Finally we require that a = b mod H and b = c mod H forces 
a = c mod H. The first congruence translates into ab 1 e H, the second 
into bc~ 1 e H; using that H is a subgroup of G, (ab 1 )(bc x ) e H. How¬ 
ever, ac~ x = aec~ 1 = a{b- x b)c~ x = (ab~ x )(br x ); hence ac~ x e H, from 
which it follows that a = c mod H. 

This establishes that congruence mod H is a bona fide equivalence 
relation as defined in Chapter 1, and all results about equivalence relations 
have become available to us to be used in examining this particular relation. 

A word about the notation we used. If G were the group of integers under 
addition, and H = H n were the subgroup consisting of all multiples of n, 
then in G, the relation a = b mod H, that is, ab~ 1 e H, under the additive 
notation, reads “a — b is a multiple of w.” This is the usual number theoretic 
congruence mod n. In other words, the relation we defined using an 
arbitrary group and subgroup is the natural generalization of a familiar 
relation in a familiar group. 

DEFINITION If // is a subgroup of G, a e G, then Ha = {ha\he H }. 
Ha is called a right coset of H in G. 

LEMMA 2.4.4 For all a e G, 

Ha = {x e G \ a = x mod H). 

Proof. Let [a] = {x e G \ a = x mod H}. We first show that Ha <= [a]. 
For, if h e H, then a(ha) ~ 1 = a(a~ x h~ x ) = h~ x e H since H is a subgroup 
of G. By the definition of congruence mod H this implies 'that ha e [a] 
for every h e H, and so Ha c= [a]. 

Suppose, now, that x e [a]. Thus ax~ 1 e H, so (ax -1 ) 1 — xa 1 is 
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also in H. That is, xa~ 1 = h for some h e H. Multiplying both sides by a 
from the right we come up with x = ha, and so x e Ha. Thus [a] a Ha. 
Having proved the two inclusions [a] c= Ha and Ha c= [a], we can conclude 
that [a] = Ha, which is the assertion of the lemma. 

In the terminology of Chapter 1, [a], and thus Ha, is the equivalence class 
of a in G. By Theorem 1.1.1 these equivalence classes yield a decomposition 
of G into disjoint subsets. Thus any two right cosets of H in G either are identical 
or have no element in common. 

We now claim that between any two right cosets Ha and Hb of H in G 
there exists a one-to-one correspondence, namely, with any element ha e Ha, 
where h e H, associate the element hb e Hb. Clearly this mapping is onto 
Hb. We aver that it is a one-to-one correspondence, for if hfib = h 2 b, with 
h t , h 2 e H, then by the cancellation law in G, h l = h 2 and so h x a — h 2 a. 
This proves 

LEMMA 2.4.5 There is a one-to-one correspondence between any two right cosets 
of H in G. 

Lemma 2.4.5 is of most interest when H is a finite group, for then it merely 
states that any two right cosets of H have the same number of elements. 
How many elements does a right coset of H have? Well, note that H — He 
is itself a right coset of H, so any right coset of H in G has o(H ) elements. 
Suppose now that G is a finite group, and let k be the number of distinct 
right cosets of H in G. By Lemmas 2.4.4 and 2.4.5 any two distinct right 
cosets of H in G have no element in common, and each has o(H) elements. 

Since any a e G is in the unique right coset Ha, the right cosets fill outrG. 
Thus if k represents the number of distinct right cosets of H in G we must 
have that ko(H) = o(G). We have proved the famous theorem due to 
Lagrange, namely, 

THEOREM 2.4.1 If G is a finite group and H is a subgroup of G, then o{H ) 
is a divisor of o(G). 

definition If H is a subgroup of G, the index of H in G is the number of 
distinct right cosets of H in G. 

We shall denote it by i G (H). In case G is a finite group, i G {H ) = 
»(G)/o(W), as became clear in the proof of Lagrange’s theorem. It is quite 
Possible for an infinite group G to have a subgroup H ^ G which is of finite 
hidex in G. 

It might be difficult, at this point, for the student to see the extreme 
nnportance of this result. As the subject is penetrated more deeply one will 
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become more and more aware of its basic character. Because the theorem 
is of such stature it merits a little closer scrutiny, a little more analysis, 
and so we give, below, a slightly different way of looking at its proof In 
truth, the procedure outlined below is no different from the one already 
given. The introduction of the congruence mod H smooths out the listing 
of elements used below, and obviates the need for checking that the new 
elements introduced at each stage did not appear before. 

So suppose again that G is a finite group and that H is a subgroup of G. 
Let h u h 2 , • • •, h r be a complete list of the elements of H, r = o{H). If 
H — G, there is nothing to prove. Suppose, then, that H # G; thus there 
is an a e G, a $ H. List all the elements so far in two rows as 

^1 j h 2 , . . ., h r , 
h^a, h 2 a , . . ., h r a. 

We claim that all the entries in the second line are different from each other 
and are different from the entries in the first line. If any two in the second 
line were equal, then h t a = h^a with i ^ j , but by the cancellation law this 
would lead to h t = hj, a contradiction. If an entry in the second line were 
equal to one in the first line, then h t a = hp resulting in a = hf 1 h j e H 
since H is a subgroup of G; this violates a $ H. 

Thus we have, so far, listed 2 o(H) elements; if these elements account 
for all the elements of G, we are done. If not, there is a b e G which did not 
occur in these two lines. Consider the new list 

^1? h 2 ,. . ., h r , 
h^ciy h 2 a ,. . ., 
h x b , h 2 b ,. .., h r b. 

As before (we are now waving our hands) we could show that no two 
entries in the third line are equal to each other, and that no entry in the 
third line occurs in the first or second line. Thus we have listed 3 o{H) 
elements. Continuing in this way, every new element introduced, in fact, 
produces o(H) new elements. Since G is a finite group, we must eventually 
exhaust all the elements of G. But if we ended up using k lines to list all the 
elements of the group, we would have written down ko(H) distinct elements, 
and so ko(H ) = o(G). 

It is essential to point out that the converse to Lagrange’s theorem is 
false—a group G need not have a subgroup of order m if m is a divisor of 
o(G). For instance, a group of order 12 exists which has no subgroup of 
order 6 . The reader might try to find an example of this phenomenon; the 
place to look is in ^ 4 , the symmetric group of degree 4 which has a sub¬ 
group of order 12 , which will fulfill our requirement. , 

Lagrange’s theorem has some very important corollaries. Before we 
present these we make one definition. 
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DEFINITION If G is a group and a e G, the order (or period) of a is the 
least positive integer m such that a m = e. 

If no such integer exists we say that a is of infinite order. We use the 
notation o(a) for the order of a. Recall our other notation: for two integers 
u,v,u\ v reads “u is a divisor of a.” 

COROLLARY 1 If G is a finite group and a e G, then o(a ) | o(G). 

Proof. With Lagrange’s theorem already in hand, it seems most natural 
to prove the corollary by exhibiting a subgroup of G whose order is o(a). 
The element a itself furnishes us with this subgroup by considering the 
cyclic subgroup, (a), of G generated by a; (a) consists of e, a, a 2 , .... How 
many elements are there in (a) ? We assert that this number is the order of a. 
Clearly, since a o(o) = e, this subgroup has at most o{a ) elements. If it 
should actually have fewer than this number of elements, then a 1 = a j 
for some integers 0 < i < j < o(a). Then a j ~ l = e, yet 0 < j — i < o(a) 
which would contradict the very meaning of o(a). Thus the cyclic sub¬ 
group generated by a has o{a) elements, whence, by Lagrange’s theorem, 
o(a) | o(G). 

COROLLARY 2 If G is a finite group and a e G, then a o(G) = e. 

Proof. By Corollary 1, o(a) \ o{G) ; thus o{G) = mo(a). Therefore, 
a 0(G) = a mo(a) = (a 0(a) ) m = e m = e. 

A particular case of Corollary 2 is of great interest in number theory. 
The Euler 0-function, 0(w), is defined for all integers n by the following: 
0 (1) = 1; for n > 1, 0(«) = number of positive integers less than n and 
relatively prime to n. Thus, for instance, 0(8) =4 since only 1, 3, 5, 7 
are the numbers less than 8 which are relatively prime to 8. In Problem 15 (b) 
at the end of Section 2.3 the reader was asked to prove that the numbers 
less than n and relatively prime to n formed a group under multiplication 
mod n. This group has order 0(w). If we apply Corollary 2 to this group 
we obtain 

COROLLARY 3 (Euler) If n is a positive integer and a is relatively prime 
t° n, then a = 1 mod n. 

In order to apply Corollary 2 one should replace a by its remainder on 
division by n. If n should be a prime number />, then 0(/>) = p — 1. If a 
ts an integer relatively prime to p, then by Corollary 3, a p ~ 1 = 1 mod />, 
whence a p = a mod p. If, on the other hand, a is not relatively prime to />, 
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since p is a prime number, we must have that p | a, so that a = 0 mod p ; 
hence 0 = a v = a mod p here also. Thus 

COROLLARY 4 (Fermat) If p is a prime number and a is any integer , then 
aP = a mod p. 

COROLLARY 5 If G is a finite group whose order is a prime number p, then 
G is a cyclic group. 

Proof. First we claim that G has no nontrivial subgroups H; for o(H) 
must divide o(G) = p leaving only two possibilities, namely, o(H) = 1 or 
o{H) = p. The first of these implies H = (e), whereas the second implies 
that H = G. Suppose now that a ^ e e G, and let H = (a). H is a sub¬ 
group of G, H 7 ^ (e) since a ^ e e H. Thus H = G. This says that G is 
cyclic and that every element in G is a power of a. 

This section is of great importance in all that comes later, not only for its 
results but also because the spirit of the proofs occurring here are genuinely 
group-theoretic. The student can expect to encounter other arguments 
having a similar flavor. It would be wise to assimilate the material and 
approach thoroughly, now, rather than a few theorems later when it will 
be too late. 

2.5 A Counting Principle 

As we have defined earlier, if H is a subgroup of G and a e G, then Ha 
consists of all elements in G of the form ha where h e H. Let us generalize 
this notion. If II, K are two subgroups of G, let 

HK = {x e G | x = hk, h e H, k e K}. 

Let’s pause and look at an example; in £3 let H = {e, <p), K = {e, (pi]/}. 
Since <p 2 — (^<A ) 2 = e, both H and K are subgroups. What can we say 
about HK? Just using the definition of HK we can see that HK consists of 
the elements e, <p, cj)[j/, <p 2 ij/ = p. Since HK consists of four elements and 
4 is not a divisor of 6 , the order of £3 by Lagrange’s theorem HK could not 
be a subgroup of £ 3 . (Of course, we could verify this directly but it does 
not hurt to keep recalling Lagrange’s theorem.) We might try to find out 
why HK is not a subgroup. Note that KH = {e, (J), <p\p, (j)i]/(j) = i//~ x } # HK. 
This is precisely why HK fails to be a subgroup, as we see in the next lemma. 

LEMMA 2.5.1 HK is a subgroup of G if and only if HK = KH. 

Proof. Suppose, first, that HK = KH; that is, if h e H and k e K, 
then hk = k x h x for some k x e K, h x e H (it need not be that k x = k or 
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h x = h\). To prove that HK is a subgroup we must verify that it is closed 
and every element in HK has its inverse in HK. Let’s show the closure 
first; so suppose x = hk e HK and y = h'k' e HK. Then xy = hkh'k', 
but since kh' e KH = HK, kh' = h 2 k 2 with h 2 e H, k 2 e K. Hence xy = 

= (hh 2 ) (k 2 k r ) e HK, and HK is closed. Also x~ 1 = (hk) ~ 1 = 

k~ 1 h 1 £ KH = HK, so x 1 e HK. Thus HK is a subgroup of G. 

On the other hand, if HK is a subgroup of G, then for any h e H, k e K, 

hr l k~ 1 e HK and so kh = ( h~ l k~ l )~ 1 e HK. Thus KH c- HK. Now if 
x is any element of HK, x~ 1 = hk e HK and so x = (x~ “ 1 = (hk) ~ 1 = 

k~ x h~ 1 e KH, so HK a KH. Thus HK = KH. 

An interesting special case is the situation when G is an abelian group 
for in that case trivially HK = KH. Thus as a consequence we have the 

COROLLARY If H, K are subgroups of the abelian group G, then HK is a 
subgroup of G. 

If H, K are subgroups of a group G, we have seen that the subset HK 
need not be a subgroup of G. Yet it is a perfect meaningful question to ask: 
How many distinct elements are there in the subset HK? If we denote this 
number by o(HK), we prove 


THEOREM 2.5.1 If H and K are finite subgroups of G of orders o(H) and 
o(K), respectively, then 

0{ HK) = 'W'W . 
o(H n K) 

Proof. Although there is no need to pay special attention to the particular 
case in which H n K = (e), looking at this case, which is devoid of some 
of the complexity of the general situation, is quite revealing. Here we 
should seek to show that o(HK) = o(H)o(K). One should ask oneself: How 
could this fail to happen? The answer clearly must be that if we list all the 
elements hk, h e H, k e K there should be some collapsing; that is, some 
element in the list must appear at least twice. Equivalently, for some 
^h x E H, hk = h x k x . But then h x ~ 1 h = k x k~ 1 •, now since h x E H, 
, 1 _ 1 mu st also be in H, thus h x 1 hEH. Similarly, k x k~ 1 e K. Since 
1 h = k x k 1 , h x 1 h e H n K = (e), so h x ~ l h = e, whence h = h x , a 
contradiction. We have proved that no collapsing can occur, and so, here, 
°(HK) is indeed o(H)o(K). 

With this experience behind us we are ready to attack the general case, 
above we must ask: How often does a given element hk appear as a 
Product in the list of HK ? We assert it must appear o(H n K) times! 
0 see this we first remark that if h x e H n K, then 

hk = (hhffiy-fi). 
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where hh t e H, since h e H, h t e H n K cz H and h t l k e K since 
1 e H n K c: K and k e K. Thus hk is duplicated in the product at 
least o(H n K) times. However, if hk = h'k', then h 1 h > = k(k') 1 = u , 
and ueH n K, and so h! = hu, k' = thus all duplications were 

accounted for in (1). Consequently hk appears in the list of HK exactly 
o(H n K ) times. Thus the number of distinct elements in HK is the total 
number in the listing of HK , that is, o{H^o{K) divided by the number of 
times a given element appears, namely, o(H n K). This proves the theorem. 

Suppose H, K are subgroups of the finite-group G and o(H) > y/o(G), 
o{K) > yJo(G). Since HK c: G, o(HK) < o(G). However, 


o(G) > o{HK) = 


o{H)o{K) 

o(H n K) 


V^G)Vo(G) 

o{H n tf) 


o( G ) 

o(H n ’ 


thus o(H n A") > 1, Therefore, H n K ^ {e). We have proved the 

COROLLARY If H and K are subgroups of G and o(H) > y/o(G), o(K) > 
y/o(G), then H n K ^ (e). 


We apply this corollary to a very special group. Suppose G is a finite 
group of order pq where p and q are prime numbers with p > q. We claim 
that G can have at most one subgroup of order p. For suppose H, K are 
subgroups of order p. By the corollary, H r\ K ^ (e), and being a sub¬ 
group of H, which having prime order has no nontrivial subgroups, we 
must conclude that H n K = H, and so H cz H n K cz K. Similarly 
K cz H, whence H = K, proving that there is at most one subgroup of 
order p. Later on we shall see that there is at least one subgroup of order/), 
which, combined with the above, will tell us there is exactly one subgroup 
of order p in G. From this we shall be able to determine completely the 
structure of G. 


Problems 

1. If H and K are subgroups of G, show that H n K is a subgroup of G. 
(Can you see that the same proof shows that the intersection of any 
number of subgroups of G, finite or infinite, is again a subgroup of G?) 

2. Let G be a group such that the intersection of all its subgroups which 

are different from (e) is a subgroup different from (e). Prove that 
every element in G has finite order. « 

3. If G has no nontrivial subgroups, show that G must be finite of 
prime order. 
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4. (a) If H is a subgroup of G, and a e G let aHa~ 1 = { aha~ 1 | he H }. 

Show that aHa~ 1 is a subgroup of G. 

(b) If H is finite, what is o(aHa~ *) ? 

5. For a subgroup H of G define the left coset aH of H in G as the set 
of all elements of the form ah, h e H. Show that there is a one-to-one 
correspondence between the set of left cosets of H in G and the set of 
right cosets of H in G. 

6 . Write out all the right cosets of H in G where 

(a) G = (a) is a cyclic group of order 10 and H = (a 2 ) is the 
subgroup of G generated by a 2 . 

(b) G as in part (a), H = (a 5 ) is the subgroup of G generated by a 5 . 

(c) G = A(S), S = {x t , x 2 , x 3 }, and H = {a e G | x^a = x x }. 

7. Write out all the left cosets of H in G for H and G as in parts (a), 

(b), (c) of Problem 6. 

8. Is every right coset of H in G a left coset of H in G in the groups of 
Problem 6? 

9. Suppose that H is a subgroup of G such that whenever Ha ^ Hb 

then aH ^ bH. Prove that gHg~ 1 c H for all g e G. 

10. Let G be the group of integers under addition, H n the subgroup 
consisting of all multiples of a fixed integer n in G. Determine the 
index of H n in G and write out all the right cosets of H n in G. 

11. In Problem 10, what is H n n H m ? 

12. If G is a group and H, K are two subgroups of finite index in G, 
prove that H n K is of finite index in G. Can you find an upper 
bound for the index of H n K in G? 

13. If a e G, define N(a) = {x e G | xa = ax). Show that N(a) is a 
subgroup of G. N{a ) is usually called the normalizer or centralizer of 
a in G. 

14. If H is a subgroup of G, then by the centralizer C(H) of H we mean 
the set {x e G | xh = hx all h e H). Prove that C(H) is a subgroup 
of G. 

15. The center Z of a group G is defined by Z = {z e G \ zx = xz all 
x e G). Prove that Z is a subgroup of G. Can you recognize Z as 
C(T ) for some subgroup T of G ? 

16. If H is a subgroup of G, let N(H) = {a e G \ aHa “ 1 = H) [see 
Problem 4(a)]. Prove that 

(a) N(H) is a subgroup of G. (b) N(H) =3 H. 

17. Give an example of a group G and a subgroup H such that N(H) ^ 
C(H). Is there any containing relation between N(H) and C(H)? 
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18. If H is a subgroup of G let 

N = f'j xHx~ 1 . 

x e G 

Prove that N is a subgroup of G such that aNa~ 1 = N for all a e G. 

*19. If H is a subgroup of finite index in G, prove that there is only a 
finite number of distinct subgroups in G of the form aHa~ 1 . 

*20. If H is of finite index in G prove that there is a subgroup N of G, 
contained in H, and of finite index in G such that aNa~ 1 = N for 
all a e G. Can you give an upper bound for the index of this 
N in G? 

21. Let the mapping z ab for a , b real numbers, map the reals into the 
reals by the rule x ab \x -> ax + b. Let G = {z ab \ a 0}. Prove 
that G is a group under the composition of mappings. Find the 
formula for z ab z cd . 

22. In Problem 21, let H = {z ab e G \ a is rational}. Show that H is 
a subgroup of G. List all the right cosets of H in G, and all the left 
cosets of H in G. From this show that every left coset of H in G is a 
right coset of H in G. 

23. In the group G of Problem 21, let 2V" = {z lb e G}. Prove 

(a) N is a subgroup of G. 

(b) If a e G, n e N, then ana~ 1 e N. 

*24. Let G be a finite group whose order is not divisible by 3. Suppose 
that ( ab) 3 = a 3 b 3 for all a, b e G. Prove that G must be abelian. 

*25. Let G be an abelian group and suppose that G has elements of orders 
m and n, respectively. Prove that G has an element whose order is 
the least common multiple of m and n. 

**26. If an abelian group has subgroups of orders m and n, respectively, 
then show it has a subgroup whose order is the least common multiple 
of m and n. (Don’t be discouraged if you don’t get this problem with 
what you know about group theory up to this stage. I don’t know 
anybody, including myself, who has done it subject to the restriction 
of using material developed so far in the text. But it is fun to try. 
I’ve had more correspondence about this problem than about any 
other point in the whole book.) 

27. Prove that any subgroup of a cyclic group is itself a cyclic group. 

28. How many generators does a cyclic group of order n have? (b e G 
is a generator if ( b) = G.) 

Let U n denote the integers relatively prime to n under multiplication 
mod n. In Problem 15(b), Section 2.3, it is indicated that U n is a group- 
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In the next few problems we look at the nature of U n as a group for some 
specific values of n. 

29. Show that U 8 is not a cyclic group. 

30. Show that U g is a cyclic group. What are all its generators? 

31. Show that U in is a cyclic group. What are all its generators? 

32. Show that U 18 is a cyclic group. 

33. Show that U 20 is not a cyclic group. 

34. Show that both U 25 and U 21 are cyclic groups. 

35. Hazard a guess at what all the n such that U n is cyclic are. (You 
can verify your guess by looking in any reasonable book on number 
theory.) 

36. If a € G and a m = e, prove that o(a ) | m. 

37. If in the group G, a 5 — e, aba~ 1 = b 2 for some a, b e G, find o{b). 

*38. Let G be a finite abelian group in which the number of solutions in 
G of the equation x n = e is at most n for every positive integer n. 
Prove that G must be a cyclic group. 

39. Let G be a group and A, B subgroups of G. If x,y e G define x ~ y 
ifjy = axb for some a e A, b e B. Prove 

(a) The relation so defined is an equivalence relation. 

(b) The equivalence class of x is AxB = {axb | a e A, b e B }. 
(AxB is called a double coset of A and B in G .) 

40. If G is a finite group, show that the number of elements in the double 
coset AxB is 

oWo(B) 
o(A n xBx~ x ). 

41. If G is a finite group and A is a subgroup of G such that all double 
cosets Ax A have the same number of elements, show that gAg~ 1 = A 
for all g e G. 

2.6 Normal Subgroups and Quotient Groups 

Let G be the group S z and let H be the subgroup {e, Since the index 
°f// in G is 3, there are three right cosets of H in G and three left cosets of 
H in G. We list them: 


Right Cosets 

H = {e, (j)} 

H\j/ = {^, (f)^ 
Hij/ 2 = {ij/ 2 , 4>\j/ 2 } 


Left Cosets 


H = (e, d >} 

X l /H = = W 2 } 

<A 2 h = {ifr 2 , ij, 2 d> = 


Sfe, i 


Group Theory Ch. 2 


A quick inspection yields the interesting fact that the right coset Hp is not 
a left coset. Thus, at least for this subgroup, the notions of left and right 
coset need not coincide. 

In G = S 3 let us consider the subgroup N = {e, i/q \j/ 2 }. Since the 
index of N in G is 2 there are two left cosets and two right cosets of N in G. 
We list these: 


Right Cosets Left Cosets 

N = {e, i]/, 1 1/ 2 } N = {e, \jf, i]/ 2 } 

N(j> = {(J), ij/cf), iA 2 0} <t> N = W 2 ) 

= {0, i l/ 2 (j), iA0) 

A quick inspection here reveals that every left coset of N in G is a right 
coset in G and conversely. Thus we see that for some subgroups the notion 
of left coset coincides with that of right coset, whereas for some subgroups 
these concepts differ. 

It is a tribute to the genius of Galois that he recognized that those sub¬ 
groups for which the left and right cosets coincide are distinguished ones. 
Very often in mathematics the crucial problem is to recognize and to discover 
what are the relevant concepts; once this is accomplished the job may be 
more than half done. 

We shall define this special class of subgroups in a slightly different way, 
which we shall then show to be equivalent to the remarks in the above 
paragraph. 

DEFINITION A subgroup N of G is said to be a normal subgroup of G if 
for every g e G and n e N, gng 1 e N. 

Equivalently, if by gNg~ 1 we mean the set of all gng~ 1 , n e N, then N 
is a normal subgroup of G if and only if gNg 1 <= N for every g e G. 

LEMMA 2.6.1 N is a normal subgroup of G if and only if gNg~ x = N for 
every g e G. 

Proof. If gNg~ 1 — N for every g e G, certainly gNg~ 1 a N, so N is 
normal in G. 

Suppose that N is normal in G. Thus if g e G, gNg 1 N and g x Ng = 
g~ x N(g~ x )~ 1 a N. Now, since g -1 iYg <= N, N = g(g~ 1 Ng)g 1 c 
gNg~ 1 c N, whence N = gNg~ x . 

In order to avoid a point of confusion here let us stress tha* Lemma 2.6.1 
does not say that for every n e N and every g e G, gng~ x = n. No! This 
can be false. Take, for instance, the group G to be S 3 and N to be the sub- 
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group {*> l l /2 }- If we compute (j)N<p~ 1 we obtain {e, (j)il/(])~ l , <p\l/ 2 (f)~ *} = 

{e, ^1’ y et ^ • All we require is that the set of elements 

g$g~ 1 be the same as the set of elements N. 

We now can return to the question of the equality of left cosets and 
right cosets. 

LEMMA 2.6.2 The subgroup N of G is a normal subgroup of G if and only if 
every left coset of N in G is a right coset of N in G. 

Proof. If N is a normal subgroup of G, then for every g e G, gNg~ 1 = 
JV, whence (gNg~ 1 )g = Ng; equivalently gN = Ng, and so the left coset 
gN is the right coset Ng. 

Suppose, conversely, that every left coset of N in G is a right coset of 
N in G. Thus, for g e G, gN, being a left coset, must be a right coset. 
What right coset can it be? 

Since g = ge e gN, whatever right coset gN turns out to be, it must 
contain the element g; however, g is in the right coset Ng, and two distinct 
right cosets have no element in common. (Remember the proof of Lagrange’s 
theorem?) So this right coset is unique. Thus gN = Ng follows. In other 
words, gNg~ 1 = Ngg~ 1 = N, and so N is a normal subgroup of G. 

We have already defined what is meant by HK whenever H, K are 
subgroups of G. We can easily extend this definition to arbitrary subsets, 
and we do so by defining, for two subsets, A and B, of G, AB = {* e G \ x — 
ab, a e A, b e B }. As a special case, what can we say when A = B = H, 
a subgroup of G? HH = {hf 2 | h x ,h 2 e H } cz H since H is closed under 
multiplication. But HH He = H since e e H. Thus HH = H. 

Suppose that N is a normal subgroup of G, and that a, b e G. Consider 
(Na)(Nb); since N is normal in G, aN = Na, and so 

NaNb = N (aN)b = N(Na)b = NNab — Nab. 

What a world of possibilities this little formula opens! But before we get 
earned away, for emphasis and future reference we record this as 


LEMMA 2.6.3 A subgroup N of G is a normal subgroup of G if and only if the 
product of two right cosets of N in G is again a right coset of N in G. 

Proof. If N is normal in G we have just proved the result. The proof of 
the other half is one of the problems at the end of this section. 


Suppose that N is a normal subgroup of G. The formula NaNb = Nab, 
h>r a, b e G is highly suggestive; the product of right cosets is a right coset. 

we use this product to make the collection of right cosets into a group ? 
ttdeed we can! This type of construction, often occurring in mathematics 
ail( I usually called forming a quotient structure, is of the utmost importance. 
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Let GjN denote the collection of right cosets of N in G (that is, the 
elements of GjN are certain subsets of G) and we use the product of subsets 
of G to yield for us a product in G/N. 

For this product we claim 

1. X, Y e GjN implies XY e GjN\ for X = Na, Y = Nb for some a, b e G, 
and XY = NaNb = Nab e GjN. 

2. X, Y, Z e GjN, then X = Na, Y = Nb, Z = Nc with a, b, c e G, 

and so [XY)Z = ( NaNb)Nc = N{ab)Nc = N(ab)c = Na(bc) (since G 

is associative) = Na(Nbc) = Na(NbNc) = X(YZ). Thus the product 
in GjN satisfies the associative law. 

3. Consider the element N = Ne e GjN. If X e GjN, X — Na, a e G, 
so XN = NaNe — Nae = Na = X, and similarly NX = X. Con¬ 
sequently, Ne is an identity element for GjN. 

4. Suppose X = Na e GjN (where a e G) ; thus Na e GjN, and 

NaNa~ 1 = Naa~ x = Ne. Similarly Na~ x Na = Ne. Hence Na~ x is 

the inverse of Na in G/ N. 

But a system which satisfies 1 , 2 , 3, 4 is exactly what we called a group. 
That is, 

THEOREM 2.6.1 If G is a group, N a normal subgroup of G, then Gj N is also 
a group. It is called the quotient group or factor group of G by N. 

If, in addition, G is a finite group, what is the order of G/N? Since GjN 
has as its elements the right cosets of N in G, and since there are precisely 
i G (N) = o{G)jo{N) such cosets, we can say 

LEMMA 2.6.4 If G is a finite group and N is a normal subgroup of G, then 
o{GjN) = o{G)jo{N). 

We close this section with an example. 

Let G be the group of integers under addition and let N be the set of 
all multiplies of 3. Since the operation in G is addition we shall write the 
cosets of JV in G as JV + a rather than as Na. Consider the three cosets 
N + 1, N + 2. We claim that these are all the cosets of N in G. For, 
given a e G, a — 3 b + c where b E G and c = 0 , 1 , or 2 ( c is the remainder 
of a on division by 3). Thus N+a = N+3b + c= (N + 3b) + c = 
N + c since 3b E N. Thus every coset is, as we stated, one of N, N + h 
or N + 2, and GjN = {N, N + 1, N + 2}. How do we add elements in 
GjN? Our formula NaNb — Nab translates into: (N + 1) + (N + 2) = 
N + 3 = N since 3 e N; (N + 2) + {N + 2) = N + 4 *= N + 1 and 
so on. Without being specific one feels that GjN is closely related to the 
integers mod 3 under addition. Clearly what we did for 3 we could emulate 
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for any integer n , in which case the factor group should suggest a relation 
to the integers mod n under addition. This type of relation will be clarified 
jo the next section. 

problems 

1. If H is a subgroup of G such that the product of two right cosets of 
H in G is again a right coset of H in G, prove that H is normal in G. 

2. If G is a group and H is a subgroup of index 2 in G, prove that H is 
a normal subgroup of G. 

3. If N is a normal subgroup of G and H is any subgroup of G, prove 
that NH is a subgroup of G. 

4. Show that the intersection of two normal subgroups of G is a normal 
subgroup of G. 

5. If H is a subgroup of G and N is a normal subgroup of G, show that 
H n N is a normal subgroup of H. 

6. Show that every subgroup of an abelian group is normal. 

*7. Is the converse of Problem 6 true? If yes, prove it, if no, give an 
example of a non-abelian group all of whose subgroups are normal. 

8. Give an example of a group G, subgroup H, and an element aeG 
such that aHa ~ 1 c H but aHa ~ 1 ^ H. 

9. Suppose H is the only subgroup of order o(H ) in the finite group G. 
Prove that H is a normal subgroup of G. 

10. If H is a subgroup of G, let N(H) = {g e G \ gHg~ 1 = H}. Proye 

(a) N(H ) is a subgroup of G. 

(b) H is normal in N(H). 

(c) If H is a normal subgroup of the subgroup K in G, then K c N(H) 
(that is, N(H) is the largest subgroup of G in which H is normal). 

(d) H is normal in G if and only if N(H) = G. 

11. If N and M are normal subgroups of G, prove that NM is also a 
normal subgroup of G. 

*12. Suppose that N and M are two normal subgroups of G and that 
N n M = (e). Show that for any n e N, me M, nm = mn. 

13. Jf a cyclic subgroup T of G is normal in G, then show that every 
subgroup of T is normal in G. 

*14. Prove, by an example, that we can find three groups E cz F c: G, 
where E is normal in F, F is normal in G, but E is not normal in G. 

15. If N is normal in G and a e G is of order o(a), prove that the order, 
m, of Na in GjN is a divisor of o(a). 
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16. If N is a normal subgroup ip the finite group such that i G (N) and 
o(N) are relatively prime, show that any element x e G satisfying 

^o(JV) _ g mus ^ j-, e j n ^ 

17. Let G be defined as all formal symbols x l y 3 , i = 0, i,j = 0, 1, 2, ... 
n — l where we assume 

x l y J = x l y 3 if and only if i = i', j = j' 


x 2 = y" = e, n > 2 
xy = y~ x x. 

(a) Find the form of the product ( x'y 3 ) (x k y l ) as x a y p . 

(b) Using this, prove that G is a non-abelian group of order 2 n. 

(c) If n is odd, prove that the center of G is (e), while if n is even 
the center of G is larger than (e). 

This group is known as a dihedral group. A geometric realization of 
this is obtained as follows: let y be a rotation of the Euclidean plane 
about the origin through an angle of 2n/n, and x the reflection about 
the vertical axis. G is the group of motions of the plane generated by 
y and x. 

18. Let G be a group in which, for some integer n > 1, ( ab) n = a n b n 
for all a, b £ G. Show that 

(a) = {x" | x e G} is a normal subgroup of G. 

(b) G (n_ = {x n ~ 1 | x e G} is a normal subgroup of G. 

19. Let G be as in Problem 18. Show 

(a) a n ~ x b n = b n a n ~ 1 for all a, b e G. 

(b) ( aba~ x b~ = e for all a, b e G. 

20. Let G be a group such that ( ab) p = a p b p for all a, b e G, where p is 
a prime number. Let S = {x e G \ x pm = e for some m depending 
on x}. Prove 

(a) S is a normal subgroup of G. 

(b) If G = GjS and if x e G is such that x p — e then x — e. 


#21. Let G be the set of all real 2x2 matrices ^ ^ 

under matrix multiplication. Let N = (C :)i- • 

(a) A is a normal subgroup of G. 

(b) GjN is abelian. 


where ad # 0, 


Prove that 


2.7 Homomorphisms 

The ideas and results in this section are closely interwoven with those of the 
preceding one. If there is one central idea which is common to all aspects 
of modern algebra it is the notion of homomorphism. By this one means 
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mapping from one algebraic system to a like algebraic system which 
structure. We make this precise, for groups, in the next definition. 

DEFINITION a mapping (j) from a group G into a group G is said to be a 
homomorphism if for all a, b e G, <p(ab) = <f>(a)<f>(b). 

Notice that on the left side of this relation, namely, in the term (j)(ab), 
the product ab is computed in G using the product of elements of G, whereas 
on the right side of this relation, namely, in the term 0(a)0(£), the product 
is that of elements in G. 

Example 2.7.0 (j)(x ) = e all x e G. This is trivially a homomorphism. 

Likewise <p{x) = x for every x e G is a homomorphism. 

Example 2.7.1 Let G be the group of all real numbers under addition 
(i.e., ab for a, b e G is really the real number a + b) and let G be the group 
of nonzero real numbers with the product being ordinary multiplication of 
real numbers. Define (f):G -» G by <p(a) = 2 a . In order to verify that 
this mapping is a homomorphism we must check to see whether (f)(ab) = 
<j>(a)(f)(b), remembering that by the product on the left side we mean the 
operation in G (namely, addition), that is, we must check if 2 a+b = 2 fl 2 b , 
which indeed is true. Since 2“ is always positive, the image of (j) is not all 
of G, so 0 is a homomorphism of G into G, but not onto G. 

Example 2.7.2 Let G = S 3 = {e, p, p, i jj 2 , pp, 4>p 2 } and G = {e, (f)}. 
Define the mapping f :G —> G by f (4> l i]/ j ) = <p l . Thus f (e) = e,f(<j>) — 

f M = e, /OA 2 ) = e, /(0i/O = 0,/(0<A 2 ) = 0- The reader should 
verify that f so defined is a homomorphism. 

Example 2.7.3 Let G be the group of integers under addition and let 
G = G. For the integer x e G define (f> by (f)(x) = 2x. That p is a homo¬ 
morphism then follows from (f)(x + y) = 2(x + y) = 2x + 2y = (p{x) + 0(j>). 

Example 2.7.4 Let G be the group of nonzero real numbers under 
multiplication, G = (1, -1}, where 1.1 = 1, (-1)(-1) = 1, 1(-1) = 
(~~1)1 = — 1. Define -> G by (f)(x ) = 1 if a: is positive, (p{x) = — 1 if 
* is negative. The fact that 0 is a homomorphism is equivalent to the 
statements: positive times positive is positive, positive times negative is 
Negative, negative times negative is positive. 

Example 2.7.5 Let G be the group of integers under addition, let G n be 
the group of integers under addition modulo n. Define 0 by <p{x) = 
remainder of x on division by n. One can easily verify this is a homo¬ 
morphism. 
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Example 2.7.6 Let G be the group of positive real numbers under 
multiplication and let G be the group of all real numbers under addition. 
Define (f>:G —> G by (j)(x ) = log 10 x. Thus 

< P(xy) = log,o(^) = log 10 (*) + log 10 (jO = <t>{x)(f>(y) 

since the operation, on the right side, in G is in fact addition. Thus (j) is a 
homomorphism of G into G. In fact, not only is 0 a homomorphism but, 
in addition, it is one-to-one and onto. 

# Example 2.7.7 Let G be the group of all real 2x2 matrices 

such that ad — be ^ 0, under matrix multiplication. Let G be the group 
of all nonzero real numbers under multiplication. Define (f):G —> G by 

— ad — be. 

We leave it to the reader to check that (j) is a homomorphism of G onto G. 

The result of the following lemma yields, for us, an infinite class of 
examples of homomorphisms. When we prove Theorem 2.7.1 it will turn 
out that in some sense this provides us with the most general example of a 
homomorphism. 

LEMMA 2.7.1 Suppose G is a group, N a normal subgroup of G; define the 
mapping (j) from G to GjN by (j)(x) = Nx for all x e G. Then (f> is a homo¬ 
morphism of G onto GjN. 

Proof. In actuality, there is nothing to prove, for we already have 
proved this fact several times. But for the sake of emphasis we repeat it. 

That (f) is onto is trivial, for every element X e GjN is of the form 
X = Ny, y e G, so X = 4>(y). To verify the multiplicative property 
required in order that (j) be a homomorphism, one just notes that if 
x, y e G, 

<j>{xy) = Nxy = NxNy = (f>(x)(f)(y). 

In Lemma 2.7.1 and in the examples preceding it, a fact which comes 
through is that a homomorphism need not be one-to-one; but there is a 
certain uniformity in this process of deviating from one-to-oneness. This 
will become apparent in a few lines. 

DEFINITION If 0 is a homomorphism of G into G, the kernel of (j), K^, is 
defined by K^ = {x e G \ (f>(x) = e, e = identity element of G}. 

Before investigating any properties of K^ it is advisable to establish that, 
as a set, is not empty. This is furnished us by the first part of 
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LEMMA 2.7.2 If (j> is a homomorphism of G into G, then 

j ^(g) = e, the unit element of G. 

2 (p(x -1 ) — (p(x)~ 1 for all x e G. 

Proof. To prove (1) we merely calculate (p(x)e = <f>{x) = (ft(xe) = 
<f>(x)(f>(e)> so by the cancellation property in G we have that (p(e) = e. 

To establish (2) one notes that e = (p(e ) = (p{xx _1 ) = <p{x)<p{x~ l ), so 
by the very definition of (p{x) ~ 1 in G we obtain the result that (p(x~ 1 ) = 

The argument used in the proof of Lemma 2.7.2 should remind any 
reader who has been exposed to a development of logarithms of the argument 
used in proving the familiar results that log 1 = 0 and log (1/x) = —log x; 
this is no coincidence, for the mapping (p :x —> log x is a homomorphism of 
the group of positive real numbers under multiplication into the group of 
real numbers under addition, as we have seen in Example 2.7.6. 

Lemma 2.7.2 shows that e is in the kernel of any homomorphism, so any 
such kernel is not empty. But we can say even more. 

LEMMA 2.7.3 If (p is a homomorphism of G into G with kernel K, then K is a 
normal subgroup of G. 

Proof. First we must check whether K is a subgroup of G. To see this 
one must show that K is closed under multiplication and has inverses in it 
for every element belonging to K. 

If x, y £ K, then (f)(x) = e, (f)(y) = e, where e is the identity element of 
G, and so <fi(xy) = (f>(x)(p(y) = ee — e, whence xy e K. Also, if x e K, 
(j>(x) = e, so, by Lemma 2.7.2, (j)(x~ *) = <^(x) _1 = e~ l = e; Thus 
* 1 e K. K is, accordingly, a subgroup of G. 

To prove the normality of K one must establish that for any g e G, 
k e K, gkg~ 1 £ K; in other words, one must prove that (^(gkg -1 ) = e 
Whenever = e. But <f>(gkg~ x ) = <t>{g)(t>(k)(p(g~ x ) = 4>{g)e(f>(g)~ 1 = 

= e- This completes the proof of Lemma 2.7.3. 

Let (p now be a homomorphism of the group G onto the group G, and 
suppose that K is the kernel of (p. If g e G, we say an element x £ G is an 
inverse image of g under (f> if <j)(x) = g. What are all the inverse images of 
if? For g = e we have the answer, namely (by its very definition) K. 
What about elements g ^ <?? Well, suppose x £ G is one inverse image of g; 
can we write down others? Clearly yes, for if k e K, and if y = kx, then 
Qiy) = (f>(kx) = <f>(k)<f){x) = eg = g. Thus all the elements Kx are in 
the inverse image of g whenever x is. Can there be others? Let us suppose 
that <j>(z) = g = (p(x). Ignoring the middle term we are left with 
*r{z) = <p(x), and so (p{z)<p{x) ~ 1 = e. But 4 1 = 4>{x~ l ), whence 
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e = (j)(z)(j)(x)~ 1 = cj)(z)(j)(x~ 1 ) = 0(£x _1 ), in consequence of which 
zx~ 1 e K ; thus z G Tfx. In other words, we have shown that Kx accounts 
for exactly all the inverse images of g whenever x is a single such inverse 
image. We record this as 

LEMMA 2.7.4 If (f) is a homomorphism of G onto G with kernel K, then the set 
of all inverse images of g e G under (j) in G is given by Kx, where x is any particular 
inverse image of g in G. 

A special case immediately presents itself, namely, the situation when 
K = (e). But here, by Lemma 2.7.4, any geG has exactly one inverse 
image. That is, (j) is a one-to-one mapping. The converse is trivially true, 
namely, if (j) is a one-to-one homomorphism of G into (not even onto) G, its 
kernel must consist exactly of e. 

DEFINITION A homomorphism p from G into G is said to be an isomor¬ 
phism if p is one-to-one. 

DEFINITION Two groups G, G* are said to be isomorphic if there is an 
isomorphism of G onto G*. In this case we write G « G*. 

We leave to the reader to verify the following three facts: 

1. G « G. 

2. G » G* implies G* » G. 

3. G x G*, G* « G** implies G « G**. 

When two groups are isomorphic, then, in some sense, they are equal. 
They differ in that their elements are labeled differently. The isomorphism 
gives us the key to the labeling, and with it, knowing a given computation 
in one group, we can carry out the analogous computation in the other. 
The isomorphism is like a dictionary which enables one to translate a 
sentence in one language into a sentence, of the same meaning, in another 
language. (Unfortunately no such perfect dictionary exists, for in languages 
words do not have single meanings, and nuances do not come through in a 
literal translation.) But merely to say that a given sentence in one language 
can be expressed in another is of little consequence; one needs the dictionary 
to carry out the translation. Similarly it might be of little consequence to 
know that two groups are isomorphic; the object of interest might very well 
be the isomorphism itself. So, whenever we prove two groups to be iso¬ 
morphic, we shall endeavor to exhibit the precise mapping which yields 
this isomorphism. 

Returning to Lemma 2.7.4 for a moment, we see in it a means of character¬ 
izing in terms of the kernel when a homomorphism is actually an isomor¬ 
phism. 
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COROLLARY A homomorphism p of G into G with kernel is an isomorphism 
0 fG into G if and only if = (e). 

This corollary provides us with a standard technique for proving two 
groups to be isomorphic. First we find a homomorphism of one onto the 
0 ther, and then prove the kernel of this homomorphism consists only of 
the identity element. This method will be illustrated for us in the proof 
of the very important 

THEOREM 2.7.1 Let (f) be a homomorphism of G onto G with kernel K. Then 
G/K * G. 

Proof. Consider the diagram 



K 


where o(g) = Kg. 

We should like to complete this to 


G 


G 

K 


0 


-±G 




It seems clear that, in order to construct the mapping p from G/K to G, 
We should use G as an intermediary, and also that this construction should 
be relatively uncomplicated. What is more natural than to complete the 
diagram using 


* 


Kg' 


->4>(g) 
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With this preamble we formally define the mapping p from G/K to G by: 
if X e G/K, X = Kg, then \}t(X) = (j){g). A problem immediately arises: 
is this mapping well defined? If X e G/K, it can be written as Kg in several 
ways (for instance, Kg = Kkg, k e K); but if X — Kg = Kg', g, g' e G, 
then on one hand i j/(X) = 4>(g), and on the other, \//(X) — 4>{g')- For 
the mapping i ]/ to make sense it had better be true that (j)(g) = 0 (g'). 
So, suppose Kg = Kg'; then g = kg', where k e K, hence (f>(g) — (j)(kg') = 
0(A:)0(^') = e(j)(g’) = 4>{g') since k e K, the kernel of (j). 

We next determine that i j/ is onto. For, if x e G, x — g e G (since 

(j) is onto) so x = 4>(g) — \j/(Kg). 

If X, Y e G/K, X = Kg, Y — Kf, g,fe G, then XY = KgKf = Kgf, 
so that \\j(XY) =i ^ {Kgf) = (fr(gf) = 0(g) </>(/) since 0 is a homomorphism 
of G onto G. But ^(X) = iff (Kg) = <j>(g), i ji(Y) = i K K f) = </>(/), so we 
see that i ]/(XY) = \l/(X)ij/(Y), and i/r is a homomorphism of G/K onto G. 

To prove that \]/ is an isomorphism of G/K onto G all that remains is to 
demonstrate that the kernel of i jj is the unit element of G/K. Since the unit 
element of G/K is K — Ke , we must show that if t j/(Kg) = e, then Kg = 
Ke = K. This is now easy, for e = i {/(Kg) = (j)(g), so that (j)(g) = e, 
whence g is in the kernel of (j), namely K. But then Kg = K since K is a 
subgroup of G. All the pieces have been put together. We have exhibited 
a one-to-one homomorphism of G/K onto G. Thus G/K « G, and Theorem 
2.7.1 is established. 

Theorem 2.7.1 is important, for it tells us precisely what groups can be 
expected to arise as homomorphic images of a given group. These must be 
expressible in the form G/K, where K is normal in G. But, by Lemma 2.7.1, 
for any normal subgroup N of G, G/N is a homomorphic image of G. Thus 
there is a one-to-one correspondence between homomorphic images of G 
and normal subgroups of G. If one were to seek all homomorphic images of 
G one could do it by never leaving G as follows: find all normal subgroups 
N of G and construct all groups G/N. The set of groups so constructed 
yields all homomorphic images of G (up to isomorphisms). 

A group is said to be simple if it has no nontrivial homomorphic images, 
that is, if it has no nontrivial normal subgroups. A famous, long-standing 
conjecture was that a non-abelian simple group of finite order has an even 
number of elements. This important result has been proved by the two 
American mathematicians, Walter Feit and John Thompson. 

We have stated that the concept of a homomorphism is a very important 
one. To strengthen this statement we shall now show how the methods and 
results of this section can be used to prove nontrivial facts about groups. 
When we construct the group GjN, where N is normal in G t if we should 
happen to know the structure of G/N we would know that of G “up to N.’ 
True, we blot out a certain amount of information about G, but often 
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enough is left so that from facts about G/N we can ascertain certain ones 
about G. When we photograph a certain scene we transfer a three- 
dimensional object to a two-dimensional representation of it. Yet, looking 
at the picture we can derive a great deal of information about the scene 
photographed. 

M In the two applications of the ideas developed so far, which are given 
below, the proofs given are not the best possible. In fact, a little later in 
this chapter these results will be proved in a more general situation in an 
easier manner. We use the presentation here because it does illustrate 
effectively many group-theoretic concepts. 

APPLICATION 1 (Cauchy’s Theorem for Abelian Groups) Suppose G 
is a finite abelian group and p | o(G), where p is a prime number. Then there is an 
element a ^ e e G such that a p = e. 

Proof. We proceed by induction over o(G). In other words, we assume 
that the theorem is true for all abelian groups having fewer elements than 
G. From this we wish to prove that the result holds for G. To start the 
induction we note that the theorem is vacuously true for groups having a 
Single element. 

If G has no subgroups H ^ (e), G, by the result of a problem earlier in 
the chapter, G must be cyclic of prime order. This prime must be p, and 
G certainly has p — 1 elements a ^ e satisfying a? = a 0(G) = e. 

So suppose G has a subgroup N ^ (e),G. If p | o(A r ), by our induction 
hypothesis, since o(N ) < o(G ) and N is abelian, there is an element b e N, 
b # e, satisfying b p = e; since b e N a G we would have exhibited an 
element of the type required. So we may assume that p Jf o(N). Since G 
is abelian, N is a normal subgroup of G, so GfN is a group. Moreover, 
o(G/N) = o(G)jo(N), and since p Jf o(N), 



< o(G). 


Also, since G is abelian, G/N is abelian. Thus by our induction hypothesis 
there is an element X e G/N satisfying X p = e x , the unit element of G/N, 
X e x . By the very form of the elements of GjN , X = Nb, b e G, so that 
X p = (Nb) p = Nb p . Since e x = Ne, X p = e x , X ^ e x translates into 
Nb p = N, Nb 7 ^ N. Thus b p e N, b $ N. Using one of the corollaries to 
Lagrange’s theorem, ( b p ) o(N) = e. That is, b o(N)p = e. Let, c = b o{N \ 
Certainly c p = e. In order to show that c is an element that satisfies the 
conclusion of the theorem we must finally show that c ^ e. However, if 
c = b 0(N) = e, and so ( Nb) o(N) = N. Combining this with (Nb) p = N, 

P )( o(N), p a prime number, we find that Nb = N, and so b e N, a contra¬ 
diction. Thus c ^ e, c p = e, and we have completed the induction. This 
proves the result. 
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APPLICATION 2 (Sylow’s Theorem for Abelian Groups) If G is an 
abelian group of order o(G), and if p is a prime number , such that p* \ o(G), p a+ 1 Jf 
o(G), then G has a subgroup of order p a . 

Proof. If a = 0, the subgroup (e) satisfies the conclusion of the result. 
So suppose a # 0. Then p\o(G). By Application 1, there is an element 
a # e e G satisfying a p = e. Let S = {x e G \ x p " = e some integer n). 
Since a e S, a # e, it follows that S # (e). We now assert that S is a sub¬ 
group of G. Since G is finite we must only verify that S is closed. If 
x,yeS, x pn = e, y pm = e, so that ( xy) pn + m = x pn + m y pn + m = e (we have 
used that G is abelian), proving that xy e S. 

We next claim that o(S) = p p with /J an integer 0 < /J < a. For, if some 
prime q \ o(S), q ^ p , by the result of Application 1 there is an element 
c e S, c ^ e, satisfying c q = e. However, c p " = e for some n since c e S. 
Since p n , q are relatively prime, we can find integers X, fi such that Xq + 
lip n = 1, so that c = c 1 = c Xq+ppn = {c q ) x {c pn ) p — e, contradicting c # e. 
By Lagrange’s theorem o(S) \ o(G), so that /J < a. Suppose that < a; 
consider the abelian group G/S. Since /J < a and o(G)S) = o(G)/o(S) f 
p \o(GIS), there is an element Sx, (x e G) in GjS satisfying Sx # S, 
(, Sx) p " = S for some integer n > 0. But S = ( Sx ) p " = Sx pn , and so x p " e S\ 
consequently e = (x p ”) o(S) — ( x pn ) pP = x p " + \ Therefore, x satisfies the 
exact requirements needed to put it in S ; in other words, x e S. Con¬ 
sequently Sx = S contradicting Sx ^ S. Thus < a is impossible and we 
are left with the only alternative, namely, that /J = a. S is the required 
subgroup of order p a . 

We strengthen the application slightly. Suppose T is another subgroup 
of G of order/) 2 , T ^ S. Since G is abelian ST = TS, so that ST is a sub¬ 
group of G. By Theorem 2.5.1 

= oiSHT j _ PY 
o(S n T) o(S n T) 

and since S ^ T, o(S n T) < p a , leaving us with o(ST) = p y , y > a. 
Since ST is a subgroup of G, o(ST) \ o[G ); thus p y \ o(G) violating the fact 
that a is the largest power of p which divides o(G). Thus no such subgroup 
T exists, and S is the unique subgroup of order p a . We have proved the 

COROLLARY If G is abelian of order o(G) and p a \ o{G), p a+i X o{G), there 
is a unique subgroup of G of order p a . 

If we look at G = S 3 , which is non-abelian, o(G) = 2.3, we see that G 
has 3 distinct subgroups of order 2, namely, {e, (p), {e, (pij/}, {e, (pip 2 }, s0 
that the corollary asserting the uniqueness does not carry over to non- 
abelian groups. But Sylow’s theorem holds for all finite groups. 
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We leave the application and return to the general development. Suppose 
^ is a homomorphism of G onto G with kernel K, and suppose that H is a 
gubgroup of G. Let H = {x e G | <f)(x) e H }. We assert that H is a sub¬ 
group of G and that H zd K. That H zz> K is trivial, for if x e K, <f)(x) — e 
is in 8, so that K cz //follows. Suppose now that x,y e H; hence <j)(x) e H, 
${y) e ^ from which we deduce that <f)(xy) = <f)(x)(f)(y) e H. There¬ 
fore, xy £ H and H is closed under the product in G. Furthermore, if 
X e H, 4>(x) e H and so (f>(x~ 1 ) = <j)(x) ~ 1 e H from which it follows that 
x _1 e H. All in all, our assertion has been established. What can we say 
in addition in case H is normal in G? Let g e G, h e H; then <j){h) e H, 
whence <j) (ghg ~ 1 ) = 4>(g)4>(h)(j)(g) ~ 1 e H, since H is normal in G. Other¬ 
wise stated, ghg~ 1 e H , from which it follows that H is normal in G. One 
other point should be noted, namely, that the homomorphism (f) from G 
onto G, when just considered on elements of H , induces a homomorphism 
of H onto H , with kernel exactly K, since K cz H; by Theorem 2 . 7.1 we 
have that H k H\K. 

Suppose, conversely, that L is a subgroup of G and K c L. Let L — 
{x e G | x = (f)(1), l e L}. The reader should verify that L is a subgroup 
of G. Gan we explicitly describe the subgroup T = {y e G \ <J)(y) £ L}? 
Clearly L cz T. Is there any element t e T which is not in L ? So, suppose 
t e T; thus <p(t) e L, so by the very definition of L, <j)(t) — (f)(1) for some 
/el. Thus (f>(tl -1 ) = ^(0^(0 _1 = whence tl~ l eK a L, thus t is 
in LI = L. Equivalently we have proved that T cz L, which, combined 
with L cz T, yields that L = T. 

Thus we have set up a one-to-one correspondence between the set of 
all subgroups of G and the set of all subgroups of G which contain K. More¬ 
over, in this correspondence, a normal subgroup of G corresponds to a 
normal subgroup of G. 

We summarize these few paragraphs in 

LEMMA 2.7.5 Let (f> be a homomorphism of G onto G with kernel K. For H a 
subgroup of G let H be defined by H = {x e G \ <j)(x) e H). Then H is a sub¬ 
group of G and H zd K; if H is normal in G, then H is normal in G. Moreover, 
this association sets up a one-to-one mapping from the set of all subgroups of G onto 
the set of all subgroups of G which contain K. 

We wish to prove one more general theorem about the relation of two 
fjfoups which are homomorphic. 

Theorem 2 . 7 . 2 . Let <f> be a homomorphism of G onto G with kernel K, and let 
8 be a normal subgroup of G, N — {x e G | <p(x) e N}. Then G\N « GfN. 
Equivalently, GfN a (GfK)f(NfK). 
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Proof. As we already know, there is a homomorphism 0 of G onto 
GjN defined by 6(g) = Ng. We define the mapping t j/:G -* GjN by 
t jj{g) = N<f>{g) for all g e G. To begin with, t/f is onto, for if g e G, 
g = (f)(g) for some g e G, since (f) is onto, so the typical element Ng in 
GfN can be represented as N(J>{g) = t //(g). 

If a, b e G, \j/(ab) = N(j){ab) by the definition of the mapping t/o How¬ 
ever, since 0 is a homomorphism, (j){ab) = (j){a)(j){b ). Thus t/f( ab) = 
N(f)(a)(f)(b ) = N(j){a)N(j){b) = t/f(a)t/f(6). So far we have shown that i// is 
a homomorphism of G onto GjN. What is the kernel, T, of t j/? Firstly, if 
n e N, (f)(n) e N, so that \j/{n) = N<])(n) = N, the identity element of 
GjN, proving that N <= T. On the other hand, if t e T, \jj(l) = identity 
element of GjN = N; but \j/(t) = N(f){t). Comparing these two evaluations 
of \p(t), we arrive at N = N<j)(t), which forces <j)(t) e N; but this places 
On A by definition of N. That is, T <= N. The kernel of t j/ has been proved 
to be equal to N. But then t/r is a homomorphism of G onto GjN with 
kernel N. By Theorem 2.7.1 GjN K GjN, which is the first part of the 
theorem. The last statement in the theorem is immediate from the 
observation (following as a consequence of Theorem 2.7.1) that G « GjK, 
N * NjK, GjN « (( GjK)j{NjK ). 

Problems 

1. In the following, verify if the mappings defined are homomorphisms, 
and in those cases in which they are homomorphisms, determine the 
kernel. 

(a) G is the group of nonzero real numbers under multiplication, 
G = G, (f>(x) = x 2 all x e G. 

(b) G, G as in (a), (j)(x) = 2 X . 

(c) G is the group of real numbers under addition, G = G, (j>(x) = 
x + 1 all x e G. 

(d) G, G as in (c), (f>{x) = 13.x for x e G. 

(e) G is any abelian group, G = G, (J)(x) = x 5 all x e G. 

2. Let G be any group, g a fixed element in G. Define (jr.G -*■ G by 
(j>(x) = gxg~ 1 . Prove that (j> is an isomorphism of G onto G. 

3. Let G be a finite abelian group of order o(G) and suppose the integer 
n is relatively prime to o(G). Prove that every g e G can be written 
as g = x n with x e G. {Hint: Consider the mapping (jr.G -> G 
defined by (j>{j>) = y n , and prove this mapping is an isomorphism 
of G onto G .) 

4. (a) Given any group G and a subset U, let U be the smallest sub¬ 

group of G which contains U. Prove there is such a subgroup U 
in G. {0 is called the subgroup generated by U .) 
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(b) If gug 1 e U for all g e G, u e U, prove that U is a normal 
subgroup of G. 

5. Let U — {xyx~ 1 | *,jy e G}. In this case U is usually written as 
G' and is called the commutator subgroup of G. 

(a) Prove that G' is normal in G. 

(b) Prove that G/G' is abelian. 

(c) If G/N is abelian, prove that N id G\ 

(d) Prove that if His a subgroup of G and H => G', then H is normal 
in G. 

6. If N, M are normal subgroups of G, prove that NMjM « N/N n M. 

7. Let F be the set of real numbers, and for a, b real, a ^ 0 let 
r ab ‘-V -*■ V defined by T ab (x) = ax + b. Let G = {r ab | a, b real, 
a # 0} and let N = {r 16 e G}. Prove that N is a normal subgroup 
of G and that G/N & group of nonzero real numbers under multi¬ 
plication. 

8. Let G be the dihedral group defined as the set of all formal symbols 
x l y j , i = 0, 1 , j = 0, 1,.. ., n — 1 , where x 2 = e, y n = e, xy = 
y~ l x. Prove 

(a) The subgroup N — {e,y,y 2 , ... ,y n ~ l } is normal in G. 

(b) That G/N a* W, where W — { 1,-1} is the group under 
the multiplication of the real numbers. 

9. Prove that the center of a group is always a normal subgroup. 

10. Prove that a group of order 9 is abelian. 

11. If G is a non-abelian group of order 6, prove that G « S 3 . 

12. If G is abelian and if N is any subgroup of G, prove that G/ft is 
abelian. 

13. Let G be the dihedral group defined in Problem 8. Find the center 
of G. 

14. Let G be as in Problem 13. Find G', the commutator subgroup of G. 

15. Let G be the group of nonzero complex numbers under multiplication 
and let N be the set of complex numbers of absolute value 1 (that is, 
a + bi e N if <2 2 + b 2 = 1). Show that GjN is isomorphic to the 
group of all positive real numbers under multiplication. 

#16. Let G be the group of all nonzero complex numbers under multi¬ 
plication and let G be the group of all real 2x2 matrices of the form 

, where not both a and b are 0, under matrix multiplication. 

Show that G and G are isomorphic by exhibiting an isomorphism of 
G onto G. 


fab 
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*17. Let G be the group of real numbers under addition and let N be the 
subgroup of G consisting of all the integers. Prove that GjN is 
isomorphic to the group of all complex numbers of absolute value 1 
under multiplication. 

#18. Let G be the group of all real 2 x 2 matrices 
under matrix multiplication, and let 

N = ^ e G | ad — be = 1 

Prove that N G', the commutator subgroup of G. 

*#19. In Problem 18 show, in fact, that N = G'. 

#20. Let G be the group of all real 2x2 matrices of the form 

where ad # 0, under matrix multiplication. Show that G' is precisely 
the set of all matrices of the form 

21. Let S x and S 2 be two sets. Suppose that there exists a one-to-one 
mapping t/f of S x into S 2 . Show that there exists an isomorphism of 
A(S^ into A(S 2 ), where A(S) means the set of all one-to-one mappings 
of S onto itself. 

2.8 Automorphisms 

In the preceding section the concept of an isomorphism of one group into 
another was defined and examined. The special case in which the isomor¬ 
phism maps a given group into itself should obviously be of some importance. 
We use the word “into” advisedly, for groups G do exist which have iso¬ 
morphisms mapping G into, and not onto, itself. The easiest such example 
is the following: Let G be the group of integers under addition and define 
(j):G -» G by <j):x -» 2x for every x e G. Since (j) :x + y -» 2 {x + y) = 
2x + 2y, ([> is a homomorphism. Also if the image of x and y under (j) are 
equal, then 2x = 2y whence x = y. (j) is thus an isomorphism. Yet <j) is 
not onto, for the image of any integer under (j) is an even integer, so, for 
instance, 1 does not appear an image under (j) of any element of G. Of 
greatest interest to us will be the isomorphisms of a group onto itself. 

DEFINITION By an automorphism of a group G we shall mean an isomorphism 
of G onto itself. 

As we mentioned in Chapter 1, whenever we talk about mappings of a set 
into itself we shall write the mappings on the right side, thus if T:S -* £ 
x e S, then xT is the image of * under T. 
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Let / be the mapping of G which sends every element onto itself, that is, 
*/ = x for all x e G. Trivially / is an automorphism of G. Let s/(G) denote 
the set of all automorphisms of G; being a subset of A(G), the set of one- 
to-one mappings of G onto itself, for elements of s/(G) we can use the product 
of A(G), namely, composition of mappings. This product then satisfies the 
associative law in A(G), and so, a fortiori, in s/(G). Also /, the unit element 
of A(G), is in (G), so s/(G) is not empty. 

;? An obvious fact that we should try to establish is that jtf(G) is a subgroup 
of A(G), and so, in its own rights, s/(G) should be a group. If T u T 2 are 
in st(G) we already know that T l T 2 e A(G). We want it to be in the 
smaller set s#(G). We proceed to verify this. For all x,y e G, 

{xy)T l = {xT l ){yT i ), 

{xy)T 2 = ( xT 2 )(yT 2 ), 

therefore 

= ((xy) T t ) T 2 = {{xT^yT^T, 

= ((*T i )T 2 )((yT 1 )T 2 ) = {xT^y^T,). 

•That is, T i T 2 e stf(G). There is only one other fact that needs verifying 
ip order that stf (G) be a subgroup of A(G), namely, that if T e s#(G), then 
T~ l e s/(G). If x, y e G, then 

{{xT~'){yT-f)T = {{xT-')T){(yT~')T) = (xl)(yl) = xy, 

thus 

(xT-')(yT~') = {xy)T~\ 

placing T~ 1 in stf(G). Summarizing these remarks, we have proved 

LEMMA 2.8.1 If G is a group , then stf(G), the set of automorphisms of T G, is 
also a group. 


Of course, as yet, we have no way of knowing that st(G), in general, has 
elements other than I. If G is a group having only two elements, the reader 
should convince himself that s/(G) consists only of I. For groups G with 
more than two elements, s/(G) always has more than one element. 

What we should like is a richer sample of automorphisms than the ones 
Ave have (namely, I). If the group G is abelian and there is some element 
*0 £ G satisfying x 0 =£ x 0 1 , we can write down an explicit automorphism, 
the mapping T defined by xT = x 1 for all x e G. For any group G, T is 
Onto; xor any abelian G, (xy) T = [xy) ~ 1 =y _1 x~ 1 — x~ 1 = (xT)(yT). 
Also x 0 T = * 0 “ 1 ^ * 0 , so T =£ I. 

C. . However, the class of abelian groups is a little limited, and we should 
y to have some automorphisms of non-abelian groups. Strangely enough 
" the task of finding automorphisms for such groups is easier than for abelian 
•v groups. 
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Let G be a group; for g G G define T g :G -> G by xT g = g l xg for all 
x e G. We claim that T g is an automorphism of G. First, T g is onto, for 
given y e G, let x = gyg 1 ■ Then xT g = g 1 (^) g = g l (gyg )g — so 
T g is onto. Now consider, for x,y G G, (xy) T g = g~ l (xy)g = g 1 (xgg l y)g = 
(g -1 *&)(£ _1 J&) = (■ xT g)(y T g )• Consequently is a homomorphism of G 
onto itself. We further assert that T g is one-to-one, for if xT g = jvT g , then 
= g~ l yg , so by the cancellation laws in G, i = > T g is called the 
inner automorphism corresponding to g. If G is non-abelian, there is a pair 
a, b G G such that ab ^ ba\ but then bT a = a 1 ba ^ b, so that T a ^ I. 
Thus for a non-abelian group G there always exist nontrivial automorphisms. 

Let y(G) = {T g g s4{G) | g e G}. The computation of T gh , for g, h e G, 
might be of some interest. So, suppose x G G; by definition, 

*7^ = (^)-^(^) = h~ l g~ l xgh = {g~ l xg)T h = (xT g )T h = xT g T h . 

Looking at the start and finish of this chain of equalities we find that 
T gh = T g T h . This little remark is both interesting and suggestive. It is of 
interest because it immediately yields that */(G) is a subgroup of sf(G). 
(Verify!) J[G) is usually called the group of inner automorphisms of G. It is 
suggestive, for if we consider the mapping t jr.G -> s/(G) defined by 
i jj{g) = T g for every g g G, then \l/(gh) = T gh = T g T h = \{/(g)\l/(h). That 
is, ^ is a homomorphism of G into stf(G) whose image is JP(G). What is 
the kernel of \j/? Suppose we call it K, and suppose g 0 G K. Then I l/{g 0 ) = I, 
or, equivalently, T go = I. But this says that for any x G G, xT go = x; 
however, xT go = g 0 ~ 1 xg 0 , and so x = g 0 l xg 0 for all x G G. Thus g 0 x = 
g ogo ~ 1 xg 0 = xg 0 ; g 0 must commute with all elements of G. But the center 
of G, Z, was defined to be precisely all elements in G which commute with 
every element of G. (See Problem 15 , Section 2 . 5 .) Thus K a Z. However, 
if zeZ, then xT z = z~ x xz = z~\zx) (since 2 * = xz) = x, whence 
T z = I and so z G K. Therefore, Z cz K. Having proved both K cz Z 
and Z cz K we have that Z = K. Summarizing, i// is a homomorphism of 
G into jtf(G) with image J{G) and kernel Z. By Theorem 2 . 7.1 
J{G) ~ G/Z. In order to emphasize this general result we record it as 

LEMMA 2.8.2 « G\Z, where f(G) is the group of inner automorphisms 

of G, and Z is the center of G. 

Suppose that (f) is an automorphisms of a group G, and suppose that 
a g G has order n (that is, a n = e but for no lower positive power) . Then 
4>{a) n = = <t>{e) = e, hence (f){a) n — e. If = e for some 

0 < m < n, then (f)(a m ) = = e, which implies, since (f) is one-to-one, 

that a m = e, a contradiction. Thus 

* 

LEMMA 2.8.3 Let G be a group and (f) an automorphism of G. If a e G is 
of order o{a) > 0, then o(<j>(a)) = o{a). 
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Automorphisms of groups can be used as a means of constructing new 
groups from the original group. Before explaining this abstractly, we con¬ 
sider a particular example. 

Let G be a cyclic group of order 7, that is, G consists of all a 1 , where we 
assume a 1 = e. The mapping (j):a l —► a 2 \ as can be checked trivially, is 
an automorphism of G of order 3, that is, 0 3 = I. Let x be a symbol which 
we formally subject to the following conditions: x 3 = e, x~ x a l x = (j)(a l ) = 

1 and consider all formal symbols x l a j , where i = 0, 1, 2 and 
j = 0, 1, 2, . . . , 6. We declare that x l a J — x k a l if and only if i = k mod 3 
and j = l mod 7. We multiply these symbols using the rules x 3 = a 1 = e, 
x ~ i ax = a 2 . For instance, (xa)(xa 2 ) = x(ax)a 2 = x(xa 2 )a 2 = x 2 a 4r . The 
reader can verify that one obtains, in this way, a non-abelian group of 
order 21. 

Generally, if G is a group, T an automorphism of order r of G which is 
not an inner automorphism, pick a symbol x and consider all elements 
x i g ) i = 0, +1, +2, . .., g e G subject to x l g = x l 'g' if and only if i = 
i ' mod r, g = g' and x~ l g l x = gT l for all i. This way we obtain a larger 
group {G, T}\ G is normal in {G, T} and {G, T}fG « group generated by 
T = cyclic group of order r. 

We close the section by determining s/(G) for all cyclic groups. 

Example 2.8.1 Let G be a finite cyclic group of order r, G = (a), a r — e. 
Suppose T is an automorphism of G. If aT is known, since a l T = ( aT) l , 
a*T is determined, so gT is determined for all g e G = (a). Thus we need 
consider only possible images of a under T. Since aT e G, and since every 
element in G is a power of a, aT = a* for some integer 0 < t < r. However, 
since T is an automorphism, aT must have the same order as a (Lemma 
2.8.3), and this condition, we claim, forces t to be relatively prime to r. For 
if d | t, d | r, then ( aT) rld = a t( - r/d) = a r(f/d) = ( a r ) t/d = e\ thus aT has order 
a divisor of rjd, which, combined with the fact that aT has order r, leads 
us to d = 1. Conversely, for any 0 < s < r and relatively prime to r, the 
mapping S\a l —> a si is an automorphism of G. Thus x^{G) is in one-to-one 
correspondence with the group U r of integers less than r and relatively 
prime to r under multiplication modulo r. We claim not only is there such 
a one-to-one correspondence, but there is one which furthermore is an 
isomorphism. Let us label the elements of (G) as T t where T t :a —>• a 1 , 
0 < i < r and relatively prime to r; T t T- :a -» a 1 -» ( a l ) j = a lj , thus 
PiTj = T t j. The mapping i —> T i exhibits the isomorphism of U r onto 
**(G). Here then, s/(G) « U r . 

■. Example 2.8.2 G is an infinite cyclic group. That is, G consists of all a 1 , 
1, +2, . . ., where we assume that a 1 = e if and only if i = 0. 
that T is an automorphism of G. As in Example 2.8.1, aT = a\ 


. * - 0 , ± 
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The question now becomes, What values of t are possible? Since T is an 
automorphism of G, it maps G onto itself, so that a = gT for some g e G. 
Thus a = a l T = (aT) l for some integer i. Since aT = a*, we must have 
that a = a tl , so that a tl ~ 1 = e. Hence ti — 1 =0; that is, ti = 1. Clearly 
since t and i are integers, this must force t = +1, and each of these gives 
rise to an automorphism, t = 1 yielding the identity automorphism / 
t = — 1 giving rise to the automorphism T:g —*■ g~ 1 for every g in the 
cyclic group G. Thus here, s#(G) « cyclic group of order 2 . 


Problems 

1. Are the following mappings automorphisms of their respective groups? 

(a) G group of integers under addition, T:x —► — x. 

(b) G group of positive reals under multiplication, T:x —► x 2 . 

(c) G cyclic group of order 12, T:x —► x 3 . 

(d) G is the group S 3 , T:x —► x . 

2. Let G be a group, H a subgroup of G, T an automorphism of G. 
Let (H) T — {hT \ h e H }. Prove (H) T is a subgroup of G. 

3. Let G be a group, T an automorphism of G, N a normal subgroup of 
G. Prove that (N) T is a normal subgroup of G. 

4. For G = S 3 prove that G « <^{G). 

5. For any group G prove that J^{G) is a normal subgroup of s/(G) (the 
group s/(G)l^(G) is called the group of outer automorphisms of G ). 

6. Let G be a group of order 4, G = { e , a, b, ab}, a 2 = b 2 = e, ab = ba. 
Determine s/(G). 

7. (a) A subgroup C of G is said to be a characteristic subgroup of G if 

(C) T c= C for all automorphisms T of G. Prove a characteristic 
subgroup of G must be a normal subgroup of G. 

(b) Prove that the converse of (a) is false. 

8. For any group G, prove that the commutator subgroup G' is a 
characteristic subgroup of G. (See Problem 5, Section 2.7). 

9. If G is a group, N a normal subgroup of G, M a characteristic sub¬ 
group of N, prove that M is a normal subgroup of G. 

10. Let G be a finite group, T an automorphism of G with the property 
that xT = x for x e G if and only if x = e. Prove that every g E G 
can be represented as g = x~ l {xT) for some x e G. 

11. Let G be a finite group, T an automorphism of G witji the property 
that xT = x if and only if x = e. Suppose further that T 2 = /• 
Prove that G must be abelian. 
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*12. Let G be a finite group and suppose the automorphism T sends more 
than three-quarters of the elements of G onto their inverses. Prove 
that xT = x~ 1 for all x e G and that G is abelian. 

13. In Problem 12, can you find an example of a finite group which is 
. non-abelian and which has an automorphism which maps exactly 
I three-quarters of the elements of G onto their inverses? 

#14. Prove that every finite group having more than two elements has a 
'■V nontrivial automorphism. 

*15. Let G be a group of order 2 n. Suppose that half of the elements of G 
are of order 2, and the other half form a subgroup H of order n. Prove 
that H is of odd order and is an abelian subgroup of G. 

*16. Let <f>(n) be the Euler ^-function. If a > 1 is an integer, prove that 
n | (f)(a n — 1). 

17. Let G be a group and Z the center of G. If T is any automorphism 
of G, prove that ( Z)T cz Z. 

18. Let G be a group and T an automorphism of G. If, for a e G, N(a) = 
{x e G | xa = ax}, prove that N{aT) = ( N(a )) T. 

19. Let G be a group and T an automorphism of G. If TV is a normal 
subgroup of G such that ( N)T cz N, show how you could use T to 
define an automorphism of G\N. 

20. Use the discussion following Lemma 2.8.3 to construct 

(a) a non-abelian group of order 55. 

(b) a non-abelian group of order 203. 

21. Let G be the group of order 9 generated by elements a, b, where a 3 = 
b 3 = e. Find all the automorphisms of G. 


2.9 Cayley's Theorem 


ten groups first arose in mathematics they usually came from some specific 
source and in some very concrete form. Very often it was in the form of a 
SOt of transformations of some particular mathematical object. In fact, 
SOost finite groups appeared as groups of permutations, that is, as subgroups 
S n - {S n — when S is a finite set with n elements.) The English 

^mathematician Cayley first noted that every group could be realized as a 
Jgroup of vl(-S') for some S. Our concern, in this section, will be with a 
sentation of Cayley’s theorem and some related results. 

IEOREM 2.9.1 (Cayley) Every group is isomorphic to a subgroup of 
f(*S) for some appropriate S. 

Proof. Let G be a group. For the set S we will use the elements of G; 
tt is, put S = G. If g e G, define t g :S(= G) -» S(= G) by xx g = xg 
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for every x e G. Ify e G, thenjy = (jyg r )g = (yg x )T g , so that x g map s 
S onto itself. Moreover, x g is one-to-one, for if x, y e S and xx g = yx g , 
then xg = yg , which, by the cancellation property of groups, implies that 
x = y. We have proved that for every g e G, x g £ A(S). 

If g, h g G, consider x gh . For any x e S = G, xx gh = x(gh) = ( xg)h = 
{ xx g)' z h ~ xtg'th- Note that we used the associative law in a very essential 
way here. From xx gh = xx g x h we deduce that x gh = x g x h . Therefore, if 
*Js-G -> A(S) is defined by i j/(g) = x g , the relation x gh = x g x h tells us that ^ 
is a homomorphism. What is the kernel K of t/r ? If^ 0 e K, then i//(g 0 ) = x go 
is the identity map on S, so that for x e G, and, in particular, for e e G, 
ex go = e • But eT g 0 = e So ~ So- Thus comparing these two expressions for 
n go we conclude that = e, whence K = (e). Thus by the corollary to 
Lemma 2.7.4 is an isomorphism of G into A(S), proving the theorem. 

The theorem enables us to exhibit any abstract group as a more concrete 
object, namely, as a group of mappings. However, it has its shortcomings; 
for if G is a finite group of order o(G), then, using S = G, as in our proof, 
A(S) has 0 (G)! elements. Our group G of order o(G) is somewhat lost in 
the group A(S) which, with its o(G)! elements, is huge in comparison to G. 
We ask: Gan we find a more economical S, one for which A (S') is smaller? 
This we now attempt to accomplish. 

Let G be a group, H a subgroup of G. Let S be the set whose elements 
are the right cosets of H in G. That is, S = [Hg | g e G}. S need not be a 
group itself, in fact, it would be a group only if H were a normal subgroup 
of G. However, we can make our group G act on S in the following natural 
way: for g e G let t g :S —> S be defined by (Hx)t g = Hxg. Emulating the 
proof of Theorem 2.9.1 we can easily prove 

1 • t g e A(S) for every g e G. 

2- tgh = t g h- 

Thus the mapping 6:G —► A(S ) defined by 0(g) = t g is a homomorphism of 
G into A(S). Gan one always say that 6 is an isomorphism? Suppose that K 
is the kernel of 0. If g 0 e K, then 0(g o ) = t go is the identity map on S, so 
that for every X e S, Xt go = X. Since every element of .S' is a right coset of 
H in G, we must have that Hat go = Ha for every a e G, and using the de¬ 
finition of t go , namely, Hat go = Hag 0 , we arrive at the identity Hag 0 = Ha 
for every a e G. On the other hand, if b e G is such that. Hxb = Hx for 
every x e G, retracing our argument we could show that b e K. Thus 
K = {b G G | Hxb = Hx all x e G}. We claim that from this character¬ 
ization of K, K must be the largest normal subgroup of G which is contained 
in H. We first explain the use of the word largest; by this we mean that if 
X is a normal subgroup of G which is contained in H } then N must be con¬ 
tained in K. We wish to show this is the case. That A" is a normal subgroup 
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0 f G follows from the fact that it is the kernel of a homomorphism of G. 
tfow we assert that K a H, for if b e K, Hab = Ha for every a e G, so, 
in particular, Hb = Heb = He = H, whence b e H. Finally, if N is a 
normal subgroup of G which is contained in H, if n e N, a e G, then 
fi&a' 1 e N a H, so that Hana~ x = H; thus Han = Ha for all a e G. 
Therefore, n e K by our characterization of K. 

We have proved 

THEOREM 2.9.2 If G is a group, H a subgroup of G, and S is the set of all 
fight cosets of H in G, then there is a homomorphism 0 of G into A(S) and the kernel 
0 f 0 is the largest normal subgroup of G which is contained in H. 

The case H = (e) just yields Cayley’s theorem (Theorem 2.9.1). If H 
should happen to have no normal subgroup of G other than (e) in it, then 
0 must be an isomorphism of G into J(£). In this case we would have cut 
down the size of the S used in proving Theorem 2.9.1. This is interesting 
mostly for finite groups. For we shall use this observation both as a means 
of proving certain finite groups have nontrivial normal subgroups, and also 
as a means of representing certain finite groups as permutation groups on 
small sets. 

We examine these remarks a little more closely. Suppose that G has a 
subgroup H whose index i(H) (that is, the number of right cosets of H in G) 
Satisfies i(H)l < o(G). Let S be the set of all right cosets of H in G. The 
mapping, 6 , of Theorem 2.9.2 cannot be an isomorphism, for if it were, 
0(G) would have o(G ) elements and yet would be a subgroup of A(S) which 
has i(H) ! < o(G) elements. Therefore the kernel of 6 must be larger than 
(e); this kernel being the largest normal subgroup of G which is contained 
In H, we can conclude that H contains a nontrivial normal subgroup of G. 

However, the argument used above has implications even when i(H) ! is 
fiot less than o(G). If o(G) does not divide i(H) ! then by invoking Lagrange’s 
theorem we know that A(S) can have no subgroup of order o(G), hence no 
Subgroup isomorphic to G. However, A(S) does contain 0(G), whence 0(G) 
Cannot be isomorphic to G ; that is, 0 cannot be an isomorphism. But then, 
as above, H must contain a nontrivial normal subgroup of G. 

We summarize this as 

EMMA 2.9.1 If G is a finite group, and H ^ G is a subgroup of G such that 
X i(H )! then H must contain a nontrivial normal subgroup of G. In particular, 
" cannot be simple. 

applications 

t L Let G be a group of order 36. Suppose that G has a subgroup H of 
Order 9 (we shall see later that this is always the case). Then i(H) = 4, 
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4! = 24 < 36 = o(G) so that in H there must be a normal subgroup 
N (e), of G, of order a divisor of 9, that is, of order 3 or 9. 

2. Let G be a group of order 99 and suppose that H is a subgroup of G 
of order 11 (we shall also see, later, that this must be true). Then i{H) = 9, 
and since 99 ^ 9! there is a nontrivial normal subgroup N # (e) of G in H. 
Since H is of order 11, which is a prime, its only subgroup other than ( e) is 
itself, implying that N = H. That is, H itself is a normal subgroup of G. 

3. Let G be a non-abelian group of order 6. By Problem 11, Section 2.3, 
there is an a # e e G satisfying a 2 = e. Thus the subgroup H = {e, a) is 
of order 2, and i{H) = 3. Suppose, for the moment, that we know that H 
is not normal in G. Since H has only itself and (e) as subgroups, H has no 
nontrivial normal subgroups of G in it. Thus G is isomorphic to a subgroup 
T of order 6 in A(S), where S is the set of right cosets of H in G. Since 
o(A(S)) = i(H)l = 3! = 6, T = S. In other words, G « A(S) = S 3 . We 
would have proved that any non-abelian group of order 6 is isomorphic to 
S 3 . All that remains is to show that H is not normal in G. Since it might be 
of some interest we go through a detailed proof of this. If H = {e, a } were 
normal in G, then for every g e G, since gag~ 1 e H and gag~ 1 ^ e, we 
would have that gag ~ 1 = a, or, equivalently, that ga = ag for every g e G. 
Let b e G, b £ H, and consider N(b) = {x e G \ xb = bx). By an earlier 
problem, N(b) is a subgroup of G, and N(b) H; N(b) ^ H since 
b e N(b), b £ H. Since H is a subgroup of N(b), o(H) \ o(N(b)) \ 6. The 
only even number n, 2 < n < 6 which divides 6 is 6. So o(N(b)) = 6 ; 
whence b commutes with all elements of G. Thus every element of G com¬ 
mutes with every other element of G, making G into an abelian group, 
contrary to assumption. Thus H could not have been normal in G. This 
proof is somewhat long-wifided, but it illustrates some of the ideas already 
developed. 

Problems 

1. Let G be a group; consider the mappings of G into itself, X g , defined 
for g e G by xX g = gx for all x e G. Prove that X g is one-to-one and 
onto, and that X gh = X h X g . 

2. Let X g be defined as in Problem 1, x g as in the proof of Theorem 2.9. b 
Prove that for any g, h e G, the mappings X g , x h satisfy X g x h = x h X g - 
{Hint: For x e G consider x{X g x h ) and x{T h X g ).) 

3. If 6 is a one-to-one mapping of G onto itself such that X g 9 = dX g 
for all g e G, prove that 6 = x h for some h e G. 

4. (a) If H is a subgroup of G show that for every g e G, gHg~ 1 is a 

subgroup of G. 






f 

Sec. 2.10 Permutation Groups 75 

(b) Prove that W = intersection of all gHg ~ 1 is a normal subgroup 
of G. 

5 . Using Lemma 2.9.1 prove that a group of order p 2 , where p is a prime 
number, must have a normal subgroup of order p. 

6 . Show that in a group G of order p 2 any normal subgroup of order p 
must lie in the center of G. 

7 . Using the result of Problem 6 , prove that any group of order p 2 is 
abelian. 

8 . If p is a prime number, prove that any group G of order 2 p must have 
a subgroup of order />, and that this subgroup is normal in G. 

9. If o(G) is pq where p and q are distinct prime numbers and if G has 
a normal subgroup of order p and a normal subgroup of order < 7 , prove 
that G is cyclic. 

*10. Let o(G) be pq , p > q are primes, prove 

(a) G has a subgroup of order p and a subgroup of order q. 

(b) If q \ p — 1, then G is cyclic. 

(c) Given two primes p, q, q\ p — 1, there exists a non-abelian group 
of order pq. 

(d) Any two non-abelian groups of order pq are isomorphic. 


2.10 Permutation Groups 

We have seen that every group can be represented isomorpIdeally as a sub¬ 
group of A(S) for some set S, and, in particular, a finite group G ca 4 .be 
represented as a subgroup of S n , for some n, where S n is the symmetric 
group of degree n. This clearly shows that the groups S n themselves merit 
Closer examination. 

Suppose that S is a finite set having n elements x t , x 2 , . . ., x n . If 
$ e = S n , then (j) is a one-to-one mapping of S onto itself, and we 

Could write 0 out by showing what it does to every element, e.g., 0:*! —> x 2 , 
*2 X 4 , x 4 —> x 3 , x 3 -> x t . But this is very cumbersome. One short cut 

plight be to write (j) out as 


^vhere x ik is the image of x t under 0. Returning to our example just above, 
$ might be represented by 

XX2 #3 X /j \ 
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While this notation is a little handier there still is waste in it, for there seems 
to be no purpose served by the symbol x. We could equally well represent 
the permutation as 


Our specific example would read 


/I 2 3 4\ 
\2 4 1 3/ ' 


Given two permutations 6 , i]/ in S n , using this symbolic representation of 6 
and i]/, what would the representation of Oi// be? To compute it we could 
start and see what Oi]/ does to x x (henceforth written as 1). 6 takes 1 into 
i x , while i]/ takes i x into k, say, then 6 i// takes 1 into k. Then repeat this 
procedure for 2, 3, . . ., n. For instance, if 6 is the permutation represented 
by 


n 

2 3 4\ 

\3 

1 2 4/ 

and i// by 

a 

2 3 4\ 

li 

3 2 4/’ 

then i x = 3 and takes 3 into 2, so k = 2 and 6 ij/ takes 1 into 2. Similarly 
6ij/:2 —► 1, 3 -> 3, 4-^4. That is, the representation for Oij/ is 

/I 

2 3 4\ 

l 2 

1 3 4/* 

If we write 


M' 

l 2 3 4\ 

5 12 4/ 

and 



l 2 3 4\ 

13 2 4/’ 

then 


»-C: 5 ;)(i 

2 3 4\ _ /I 2 3 4\ 

3 2 4/ \2 1 3 4/’ 

This is the way we shall multiply the symbols of the form 
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Let S be a set and 9 e A(S). Given two elements a, b e S we define 
& ^ g b if and only if b = ad 1 for some integer i (i can be positive, negative, 
or 0). We claim this defines an equivalence relation on S. For 

i a = 0 a since a = ad° = ae. 

JT If a = e b, then b = aO l , so that a = bd whence b = e a. 

3 . If a = gb, b = e c, then b = ad 1 , c — b9 j = ( a9 l )9 j = ad l+j , which 
implies that a = e c. 

This equivalence relation by Theorem 1.1.1 induces a decomposition of S 
jftto disjoint subsets, namely, the equivalence classes. We call the equivalence 
class of an element s e S the orbit of under 6 ; thus the orbit of under 6 
consists of all the elements s 6 l , i = 0 , + 1 , + 2 , . . .. 

In particular, if S is a finite set and s e S, there is a smallest positive 
integer l = l(s) depending on s such that sd l = s. The orbit of s under 9 
then consists of the elements s, s9, s9 2 , . . ., s9 l ~ 1 . By a cycle of 9 we mean 
the ordered set (s, s9, s9 2 , . . ., sd 1-1 ). If we know all the cycles of 9 we 
clearly know 9 since we would know the image of any element under 9. 
Before proceeding we illustrate these ideas with an example. Let 


/I 2 3 4 5 6 \ 

\2 1 3 5 6 4/ ’ 

where S consists of the elements 1, 2,. .., 6 (remember 1 stands for x 1; 
2 for x 2 , etc.). Starting with 1 , then the orbit of 1 consists of 1 = 10°, 
I0 1 = 2, \9 2 = 29 = 1, so the orbit of 1 is the set of elements 1 and 2. 
This tells us the orbit of 2 is the same set. The orbit of 3 consists just of 3 ; 
that of 4 consists of the elements 4, 49 = 5, 40 2 = 59 = 6 , 40 3 = 60 = 4. 
The cycles of 0 are (1, 2 ), (3), (4, 5, 6 ). 

We digress for a moment, leaving our particular 0 . Suppose that by the 
cyele (i l , i 2 ,. . ., i r ) we mean the permutation i jj which sends i x into i 2 , 
h mt o i 3 - • • i r _ 1 into i r and i r into i 1 , and leaves all other elements of S 
fixed. Thus, for instance, if S consists of the elements 1 , 2, . . ., 9 , then the 
symbol (1, 3, 4, 2, 6 ) means the permutation 


/I 2 3 4 5 6 7 8 9\ 

\3 6 4 2 5 1 7 8 9/ ’ 

e multiply cycles by multiplying the permutations they represent. Thus 
*!gain, if S has 9 elements, 

ft 2 3) (5 6 4 1 8 ) 

/I 2 3 4 5 6 7 8 9W1 2 3 4 5 6 7 8 9\ 

\2 3 1 4 5678 9/ \8 23 1 64759/ 

= /I 2 3 4 5 6 7 8 9\ 

\2 3 8 1 6 4 7 5 9/ ’ 
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Let us return to the ideas of the paragraph preceding the last one, and 
ask: Given the permutation 

/I 2 3 4 5 6 7 8 9\ 
y_ \2 38 1 6475 9/’ 

what are the cycles of 0? We first find the orbit of 1; namely, 1,10 = 2, 
Iq2 = 2Q = 3, 10 3 = 30 = 8 , 10 4 = 80 = 5, 10 5 = 50 = 6 , 10 6 = 60 = 4, 
10 7 = 40 = 1 . That is, the orbit of 1 is the set {1, 2, 3, 8 , 5, 6 , 4}. The 
orbits of 7 and 9 can be found to be {7}, {9}, respectively. The cycles of 0 
thus are (7), (9), (1, 10, 10 2 ,. . ., 10 6 ) = (1, 2, 3, 8 , 5, 6 , 4). The reader 
should now verify that if he takes the product (as defined in the last para¬ 
graph) of (1, 2, 3, 8 , 5, 6 , 4), (7), (9) he will obtain 0. That is, at least 
in this case, 0 is the product of its cycles. 

But this is no accident for it is now trivial to prove 


LEMMA 2.10.1 Every permutation is the product of its cycles. 

Proof. Let 0 be the permutation. Then its cycles are of the form 
U sQ,. .., s9 l ~ 1 ). By the multiplication of cycles, as defined above, and 
since the cycles of 0 are disjoint, the image of s' 6 £ under 0 , which is ^ 0 , 
is the same as the image of s' under the product, i//, of all the distinct cycles 
of 0. So 0, i ]/ have the same effect on every element of S, hence 0 = «A, 
which is what we sought to prove. 

If the remarks above are still not transparent at this point, the reader 
should take a given permutation, find its cycles, take their product, and 
verify the lemma. In doing so the lemma itself will become obvious. 

Lemma 2.10.1 is usually stated in the form every permutation can be 
uniquely expressed as a product of disjoint cycles. 

Consider the m-cycle (1, 2,. .., in). A simple computation shows that 
(1,2,..., m) = (1, 2) (1, 3) • • • (1, m). More generally the m-cycle 
(a l} a 2 , .. ., a m ) = (a,, a 2 ){a u a 3 ) • ■ • K, a m ). This decomposition is not 
unique; by this we mean that an m-cycle can be written as a product of 
2-cycles in more than one way. For instance, (1, 2, 3) = (1, 2)(1, 3) — 
(3, 1 ) ( 3 , 2). Now, since every permutation is a product of disjoint cycles 
and every cycle is a product of 2 -cycles, we have proved 

LEMMA 2.10.2 Every permutation isni product of 2-cycles. 

We shall refer to 2-cycles as transpositions. 

DEFINITION A permutation 0 e S„ is said to be an even permutation if it 
can be represented as a product of an even number of transpositions. 
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The definition given just insists that 6 have one representation as a product 
of an even number of transpositions. Perhaps it has other representations 
as a product of an odd number of transpositions. We first want to show 
that this cannot happen. Frankly, we are not happy with the proof we give 
of this fact for it introduces a polynomial which seems extraneous to the 
matter at hand. 

Consider the polynomial in ;z-variables 

Pi* i> = II i*t ~ x j )• 

i<j 

If 6 e S n let 6 act on the polynomial p(x t ,. . ., x n ) by 

6 .p ( x l ,. . ., x n ) = n (** — Xj) —> — * 0 (j-)). 

i<j i<j 

It is clear that 0:p(x l ,. .., x n ) —> +p(x l} . .., x n ). For instance, in S 5 , 
6 = (134) (25) takes 

Pi x i, • • • >*s) = i x i ~ x i)i x i - * 3 )(*i - X 4)( X 1 ~ x s)i x 2 ~ X i) 

X i x 2 ^ 4 ) i X 2 ^ 5 )(*3 ^ 4 )(^3 — ^ 5 )(^4 — ^ 5 ) 

into 

(^3 x s)i x 3 ~ x 4-) i x 3 *1 ) ( X 3 x 2)i x 5 ~ x 4)( x 5 — ^l) 

X (^5 x l){ x 4 ^1) (^4 x l)( x l — ^ 2)9 

which can easily be verified to be —p(x l ,...,x s ). 

If, in particular, 6 is a transposition, Q’.p(x u . . ., x n ) -4 —p(x u . . ., x n ). 
(Verify!) Thus if a permutation II can be represented as a product of 
an even number of transpositions in one representation, II must leave 
P( x i> • * • 9 x n) fixed, so that any representation of II as a product of trans¬ 
position must be such that it leaves p(x u . .., x n ) fixed; that is, in any 
representation it is a product of an even number of transpositions. This 
establishes that the definition given for an even permutation is a significant 
one. We call a permutation odd if it is not an even permutation. 

The following facts are now clear: 

1. The product of two even permutations is an even permutation. 

2. The product of an even permutation and an odd one is odd (likewise for 
the product of an odd and even permutation). 

3. The product of two odd permutations is an even permutation. 

The rule for combining even and odd permutations is like that of com¬ 
bining even and odd numbers under addition. This is not a coincidence 
since this latter rule is used in establishing 1, 2, and 3. 

Let A n be the subset of S n consisting of all even permutations. Since the 
product of two even permutations is even, A„ must be a subgroup of S„. 
We claim it is normal in S„. Perhaps the best way of seeing this is as follows: 
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let W be the group of real numbers 1 and — 1 under multiplication. Define 
\j/\S„ -* W by \Jj(s) = 1 if 5- is an even permutation, = — 1 if j is an 
odd"permutation. By the rules 1, 2, 3 above p is a homomorphism onto W. 
The kernel of p is precisely A n ; being the kernel of a homomorphism A n 
is a normal subgroup of S n . By Theorem 2.7.1 SJA n ~ W, so, since 


2 = o{W) 



o(S n ) 

o(AS 


we see that o(A n ) = \n\. A„ is called the alternating group of degree n. We 
summarize our remarks in 


LEMMA 2.10.3 S„ has as a normal subgroup of index 2 the alternating group , 
A n , consisting of all even permutations. 

At the end of the next section we shall return to S n again. 


Problems 


1. Find the orbits and cycles of the following permutations: 


2 . 

3. 



3 4 5 

4 5 1 

3 4 5 

4 3 1 


6 7 
6 7 

5 - 


8 9\ 

9 8 / ’ 


Write the permutations in Problem 1 as the product of disjoint cycles. 


Express as the product of disjoint cycles: 

(a) (1, 2, 3 )(4, 5)(1, 6 , 7, 8 , 9)(1, 5). 

(b) (1, 2)(1, 2, 3)(1, 2). 


4. Prove that (1, 2,. .., n) 1 = (n, n — 1, n — 2,. . ., 2, 1). 

5. Find the cycle structure of all the powers of (1, 2, . . . , 8 ). 

6 . (a) What is the order of an rc-cycle? 

(b) What is the order of the product of the disjoint cycles of lengths 
m 1 , m 2 ,..., m k ? 

(c) How do you find the order of a given permutation? 

7. Compute a~ 1 ba, where 

(1) a = (1, 3, 5)(1, 2), b = (1,5, 7, 9). 

(2) a = (5,7,9), b = (1,2,3). 

8 . (a) Given the permutation * = (1, 2)(3, 4), y = (5, 6 )(1, 3), find a 

permutation a such that a~ x xa — y. 

(b) Prove that there is no a such that a 1 (1, 2, 3 )a = (1, 3) (5, 7, 8) 

(c) Prove that there is no permutation a such that a (1,2 )a " 
(3, 4) (1, 5). 

9. Determine for what m an m-cycle is an even permutation. 
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10 . Determine which of the following are even permutations: 

(a) (1,2. 3)(1,2). 

(b) (1, 2, 3, 4, 5)(1, 2, 3)(4, 5). 

(c) (1, 2)(1, 3)(1, 4)(2, 5). 

[I. Prove that the smallest subgroup of S n containing (1,2) and 
(1, 2, . . ., n) is S n . (In other words, these generate S„.) 

12. Prove that for n > 3 the subgroup generated by the 3-cycles is A n . 

|*13. Prove that if a normal subgroup of A n contains even a single 3-cycle 
) it must be all of A n . 

>14. Prove that A s has no normal subgroups N ^ ( e ), A s . 

15. Assuming the result of Problem 14, prove that any subgroup of A s 
has order at most 12 . 

16. Find all the normal subgroups in S 4 . 

17. Ifn > 5 prove that A n is the only nontrivial normal subgroup in S n . 

Cayley’s theorem (Theorem 2.9.1) asserts that every group is isomorphic 
a subgroup of A(S ) for some S. In particular, it says that every finite 
oup can be realized as a group of permutations. Let us call the realization 
' the group as a group of permutations as given in the proof of Theorem 
1.9.1 the permutation representation of G. 

18. Find the permutation representation of a cyclic group of order n. 

19. Let G be the group {e, a, b, ab } of order 4, where a 2 — b 2 = e, 
ab = ba. Find the permutation representation of G. 

20. Let G be the group S 3 . Find the permutation representation of S 3 . 
(Note: This gives an isomorphism of S 3 into S 6 .) 

21. Let G be the group {e, 6, a, b, c, 6a, 6b, 6c), where a 2 = b 2 = c 2 = 6, 
6 2 = e, ab — 6ba = c, be — 6cb = a, ca — 6ac = b. 

(a) Show that 6 is in the center Z of G, and that Z = {e, 0}. 

(b) Find the commutator subgroup of G. 

(c) Show that every subgroup of G is normal. 

(d) Find the permutation representation of G. 

(Note: G is often called the group of quaternion units; it, and algebraic 
systems constructed from it, will reappear in the book.) 

«. Let G be the dihedral group of order 2 n (see Problem 17, Section 2.6). 
Find the permutation representation of G. 

^Let us call the realization of a group G as a set of permutations given in 
*oblem 1 , Section 2.9 the second permutation representation of G. 

(23. Show that if G is an abelian group, then the permutation representation 
of G coincides with the second permutation representation of G (i.e., 
in the notation ol the previous section, X g = x g for all g e G.) 


Group Theory Ch. 2 


24. Find the second permutation representation of S 3 . Verify directly 
from the permutations obtained here and in Problem 20 that l a x b = 
x b X a for all a, b e S 3 . 

25. Find the second permutation representation of the group G defined in 
Problem 21. 

26. Find the second permutation representation of the dihedral group of 
order 2 n. 

If H is a subgroup of G, let us call the mapping {t g \ge G} defined in 
the discussion preceding Theorem 2.9.2 the coset representation of G by H. 
This also realizes G as a group of permutations, but not necessarily iso- 
morphically, merely homomorphically (see Theorem 2.9.2). 

27. Let G = (a) be a cyclic group of order 8 and let H = (« 4 ) be its 
subgroup of order 2. Find the coset representation of G by H. 

28 Let G be the dihedral group of order 2 n generated by elements a , b 
such that a 2 = b n = e, ab = b~ x a. Let H = {«,«}. Find the coset 
representation of G by H. 

29. Let G be the group of Problem 21 and let H — { e , 6 }. Find the 
coset representation of G by H. 

30. Let G be S n , the symmetric group of order n, acting as permutations 
on the set (1,2,..., n}. Let H = {<j e G \ na = n}. 

(a) Prove that H is isomorphic to S n - V 

(b) Find a set of elements a x ,.. ., a n e G such that Ha x ,. .., Ha n 
give all the right cosets of H in G. 

(c) Find the coset representation of G by H. 

2.11 Another Counting Principle 

Mathematics is rich in technique and arguments. In this great variety one 
of the most basic tools is counting. Yet, strangely enough, it is one of the 
most difficult. Of course, by counting we do not mean the creation of tables 
of logarithms or addition tables; rather, we mean the process of precisely 
accounting for all possibilities in highly complex situations. This can some¬ 
times be done by a brute force case-by-case exhaustion, but such a routine 
is invariably dull and violates a mathematician’s sense of aesthetics. One 
prefers the light, deft, delicate touch to the hammer blow. But the most 
serious objection to case-by-case division is that it works far too rarely- 
Thus in various phases of mathematics we find neat counting devices which 
tell us exactly how many elements, in some fairly broad context, satisfy 
certain conditions. A great favorite with mathematicians is the process of 
counting up a given situation in two different ways; the comparison of the 
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two counts is then used as a means of drawing conclusions. Generally 
speaking, one introduces an equivalence relation on a finite set, measures 
the size of the equivalence classes under this relation, and then equates the 
number of elements in the set to the sum of the orders of these equivalence 
classes. This kind of an approach will be illustrated in this section. We 
l sha11 introduce a relation, prove it is an equivalence relation, and then find 
a neat algebraic description for the size of each equivalence class. From this 
simple description there will flow a stream of beautiful and powerful results 
about finite groups. 

DEFINITION If a, b e G, then b is said to be a conjugate of a in G if there 
i exists an element c e G such that b = c~ 1 ac. 

!' We shall write, for this, a ~ b and shall refer to this relation as conjugacy. 

| LEMMA 2.1 1.1 Conjugacy is an equivalence relation on G. 

| 

| Proof. As usual, in order to establish this, we must prove that 

I I . a ~ a\ 

2 . a ~ b implies that b ~ a; 

3. a ~ b, b ~ c implies that a /v (j 

for all a, b, c in G. 

We prove each of these in turn. 

1. Since a = e i ae, a ~ a, with c = e serving as the c in the definition 
of conjugacy. T " 

| 2. If a ~ b, then b = x~*ax for some x e G, hence, a = (x ~ 1 ) “ l b(x~ 1 ), 
| and since y = x~ l e G and a = y~ x by, b ~ a follows. 

| 3. Suppose that a ~ b and b ~ c where a, b, c e G. Then b = x~^ax, 
i c — y l by for some x,y s G. Substituting for b in the expression for c 

I we obtain c —y 1 (x~ 1 ax)y = ( xy)~ i a(xy ); since xy e G, a ~ c is a 
| consequence. 

|| For a e G let C{a) = {x e G | a ~ x}. C(a), the equivalence class of a 
| G un d er our relation, is usually called the conjugate class of a in G; it 
I consists of the set of all distinct elements of the form y~ x ay as j y ranges 
I over G. 

| Our attention now narrows to the case in which G is a finite group. 
I Suppose that C(a ) has c a elements. We seek an alternative description of 
| c o’ Before doing so, note that o(G) = ^ c a where the sum runs over a set 
J of a e G using one a from each conjugate class. This remark is, of course, 

I merely a restatement of the fact that our equivalence relation—conjugacy— 
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induces a decomposition of G into disjoint equivalence classes—the conjugate 
classes. Of paramount interest now is an evaluation of c a . 

In order to carry this out we recall a concept introduced in Problem 13 ; 
Section 2.5. Since this concept is important—far too important to leave to 
the off-chance that the student solved the particular problem—we go over 
what may very well be familiar ground to many of the readers. 

DEFINITION If a e G, then N{a), the normalizer of a in G, is the set 
N(a) = {x e G \ xa = ax). 

N (a) consists of precisely those elements in G which commute with a. 

LEMMA 2.11.2 N(a) is a subgroup of G. 

Proof. In this result the order of G, whether it be finite or infinite, is of 
no relevance, and so we put no restrictions on the order of G. 

Suppose that x,ye N{a). Thus xa = ax and ya = ay. Therefore, 
{xy)a = x{ya) = x{ay) — {xa) y = {ax)y — a{xy), in consequence of which 
xy e N{a). From ax — xa it follows that x~" 1 a = x {ax)x~ 1 — x~ l {xa)x~ 1 = 
ax~ i , so that * _1 is also in N{a). But then N{a) has been demonstrated 
to be a subgroup of G. 

We are now in a position to enunciate our counting principle. 

THEOREM 2.11.1 If G is a finite group, then c a = o{G) jo{N{a)); in other 
words, the number of elements conjugate to a in G is the index of the normalizer of 
a in G. 

Proof. To begin with, the conjugate class of a in G, C {a), consists exactly 
of all the elements x~ 1 ax as x ranges over G. c a measures the number of 
distinct x~ 1 ax’s. Our method of proof will be to show that two elements in 
the same right coset of N{a) in G yield the same conjugate of a whereas 
two elements in different right cosets of N{a) in G give rise to different 
conjugates of a. In this way we shall have a one-to-one correspondence 
between conjugates of a and right cosets of N{a). 

Suppose that x,y e G are in the same right coset of N{a) in G. Thus . 
y = nx, where n e N{a), and so na = an. Therefore, since y~ 1 = {nx)~ 1 ^ j 
x~ i n~ i , y~ 1 ay = x~^n~^anx = x~ 1 n~ 1 nax = x~ x ax, whence x and ) j 
result in the same conjugate of a. [ 

If, on the other hand, x and y are in different right cosets of N{a) in G 
we claim that x~ x ax y~ x ay. Were this not the case, from x~ 1 ax = y Xa J 
we would deduce that yx _1 a = ayx~ 1 ; this in turn would imply that 
yx~ 1 e N{a). However, this declares x andj; to be in the sam^ right coset 
of N{a) in G, contradicting the fact that they are in different cosets. The 
proof is now complete. 
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COROLLARY 


o{G) = £ 


o{G) 


°{N{a)) 

re this sum runs over one element a in each conjugate class. 

Proof. Since o(G) = £c a , using the theorem the corollary becomes 
Immediate. 

The equation in this corollary is usually referred to as the class equation of G. 

Before going on to the applications of these results let us examine these 
concepts in some specific group. There is no point in looking at abelian 
groups because there two elements are conjugate if and only if they are 
equal (that is, c a = 1 for every a). So we turn to our familiar friend, the 
group £ 3 . Its elements are e, (1, 2), (1, 3), (2, 3), (1, 2, 3), (1, 3, 2). We 
enumerate the conjugate classes: 

C(e) = {«} 

C( 1,2) = {( 1 , 2 ), (1,3)-‘(1,2)(1,3), (2,3)-'(l,2)(2,3), 

(1, 2, 3)“ 1 (1,2)(1,2, 3), (1, 3, 2)" ‘(1, 2)(1, 3, 2)} 

= {(1,2), (1,3), (2,3)} (Verify!) 

C(l, 2, 3) = {(1, 2, 3), (1, 3, 2)} (after another verification). 

The student should verify that JV((1, 2)) = {e, (1,2)} and JV(( 1 , 2, 3)) = 
{*» (L 2, 3), (1, 3, 2)}, so that q 12 ) = T = 3, c^ 2 , 3 ) = T — 2. 


Applications of Theorem 2.11.1 

Theorem 2.11.1 lends itself to immediate and powerful application. We 
: Oeed no artificial constructs to illustrate its use, for the results below which 
; reveal the strength of the theorem are themselves theorems of stature and 
Ipaaportance. 

I Let us recall that the center Z(G ) of a group G is the set of all a e G 
S&Uch that ax = xa for all x e G. Note the 

‘UBLEMMA a e Z if and only if N(a ) = G. If G is finite, a e Z if and 
dy if o(N(a)) = o(G). 

Proof. If a e Z, xa = ax for all x e G, whence N(a) — G. If, conversely, 
r (a) = G, xa = ax for all x e G, so that ae Z. If G is finite, o(N(a)) — 
KG) is equivalent to N(a) = G. 
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APPLICATION 1 

THEOREM 2.11.2 If o(G) — p n wherep is a prime number, then Z(G) # (e). 


Proof. If a e G, since N(a) is a subgroup of G, o(N(a)), being a divisor 
of o(G) = />", must be of the form o(N(a)) = a e Z(G) if and only if 
n a — n. Write out the class equation for this G, letting £ = o(Z(G)). We 
get p n — o(G) = Y,(P”lp na ) j however, since there are exactly £ elements 
such that n a — n, we find that 


P" 


z 


+ E 

n a <n 


P n “ ' 


Now look at this! p is a divisor of the left-hand side; since n a < n for each 
term in the X! of the right side, 


P 


P n 

£1 _ _ p n n a 

pna 


so that p is a divisor of each term of this sum, hence a divisor of this sum. 
Therefore, 


P 


P n 

/>" - E^ 

n a <n lr 


Since e e Z(G), z ^ 0; thus z is a positive integer divisible by the prime p. 
Therefore, z > 1! But then there must be an element, besides e, in Z(G)\ 
This is the contention of the theorem. 


Rephrasing, the theorem states that a group of prime-power order must 
always have a nontrivial center. 

We can now simply prove, as a corollary for this, a result given in an 
earlier problem. 

COROLLARY If o{G) = p 2 where p is a prime number, then G is abelian. 

Proof. Our aim is to show that Z(G) = G. At any rate, we already 
know that Z(G) # (e) is a subgroup of G so that o(Z(G)) = p or p 2 . If 
o(Z(G)) = p 2 , then Z(G) = G and we are done. Suppose that o(Z(G)) = p\ 
let a e G, a£Z(G). Thus N(a) is a subgroup of G, Z[G) a N(a), 
a e N(a),so that o(N(a)) > p, yet by Lagrange’s theorem o(N (a)) |o(G) = p 2 - 
The only way out is for o(N(a )) = p 2 , implying that a eZ(G), a con¬ 
tradiction. Thus o(Z(G)) = p is not an actual possibility. 

APPLICATION 2 We now use Theorem 2.11.1 to prove an important 
theorem due to Cauchy. The reader may remember that this theorem was 
already proved for abelian groups as an application of the results [developed 
in the section on homomorphisms. In fact, we shall make use of this special 
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li fos e in the proof below. But, to be frank, we shall prove, in the very next 
ggction, a much stronger result, due to Sylow, which has Cauchy’s theorem 
an immediate corollary, in a manner which completely avoids Theorem 
,11.1. To continue our candor, were Cauchy’s theorem itself our ultimate 
only goal, we could prove it, using the barest essentials of group theory, 
a few lines. [The reader should look up the charming, one-paragraph 
>f of Cauchy’s theorem found by McKay and published in the American 
thematical Monthly, Vol. 66 (1959), page 119.] Yet, despite all these 
inter-arguments we present Cauchy’s theorem here as a striking illustration 
Theorem 2.11.1. 


“THEOREM 2.11.3 (Cauchy) 
Q has an element of order p. 


IfP is a prime number and p | o(G), then 


.< Proof. We seek an element a # e e G satisfying a p = e. To prove its 
existence we proceed by induction on o(G); that is, we assume the theorem 
.to be true for all groups T such that o(T) < o(G). We need not worry 
fixmt starting the induction for the result is vacuously true for groups of 
order 1. 

W If for any subgroup W of G, W # G, were it to happen that p | o(W), 
then by our induction hypothesis there would exist an element of order/) in 
W, and thus there would be such an element in G. Thus we may assume that 
p is not a divisor of the order of any proper subgroup of G. In particular, if 
0;p Z(G), since N(a) # G, p f o(N(a)). Let us write down the class 
Equation: 

r-V °(G) 

o(G) = o(Z(G)) • ^ 


nwg o{N{a)) 


i^ince p | o(G), p Jf o(N(a)) we have that 

o(G) 


P 


o(N(a)) 


so 


P 


°(G) 

N(k*G o(N(a)) 


And 

/Rnce we also have that/) | o(G), we conclude that 

P ( "(G) - £ -^T-) = o(Z(G)). 

\ n(^g °( N (<*))J 

(G) is thus a subgroup of G whose order is divisible by p. But, after all, 
have assumed that p is not a divisor of the order of any proper subgroup 
G, so that Z(G) cannot be a proper subgroup of G. We are forced to 
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accept the only possibility left us, namely, that Z(G) = G. But then Q 
is abelian; now we invoke the result already established for abelian groups 
to complete the induction. This proves the theorem. 

We conclude this section with a consideration of the conjugacy relation 
in a specific class of groups, namely, the symmetric groups S n . 

Given the integer n we say the sequence of positive integers n x , n 2 ,. .. 
n r , n 1 < n 2 < • • • < n r constitute a partition of n if n = n x + n 2 + • • • + n 
Let p(n) denote the number of partitions of n. Let us determine p(n) for 
small values of n : 

p( 1) = 1 since 1 = 1 is the only partition of 1, 

p{ 2) = 2 since 2 = 2 and 2 = 1 + 1, 

p{ 3) = 3 since 3 = 3, 3 = 1+2, 3 = 1 + 1 + 1, 

p( 4) = 5 since 4 = 4, 4=1 +3, 4=1 + 1 +2, 

4=1 + 1 + 1 + 1, 4 = 2 + 2. 

Some others are p{ 5) = 7, p{ 6) = 11, p{ 61) = 1,121,505. There is a 
large mathematical literature on p(n). 

Every time we break a given permutation in S n into a product of disjoint 
cycles we obtain a partition of n; for if the cycles appearing have lengths n v 
n 2 ,. . ., n r , respectively, n x < n 2 < • • • < n r , then n = n x + n 2 + • • • + n r . 
We shall say a permutation a e S n has the cycle decomposition {n x , n 2 , 
. . ., n r } if it can be written as the product of disjoint cycles of lengths 
n x , n 2 , . . . , n r , < n 2 < • • • < n r . Thus in S 9 

/I 2345678 9\ 

"=(l 3 2 5 6 4 7 9 8 ) = (D(2. 3)(4, 5. 6)(7)(8, 9) 

has cycle decomposition {1, 1, 2, 2, 3}; note that l + l+ 2 + 2 + 3=9. 
We now aim to prove that two permutations in S n are conjugate if and 
only if they have the same cycle decomposition. Once this is proved, then 
S n will have exactly p{n) conjugate classes. 

To reach our goal we exhibit a very simple rule for computing the con¬ 
jugate of a given permutation. Suppose that a e S n and that a sends i -> J- 
How do we find 6~ 1 (t6 where 0 e S n ? Suppose that 0 sends i ^ s and 
j —> t; then 6~ 1 (t6 sends s -> t. In other words, to compute 0~ l a6 replace 
every symbol in a by its image under 9. For example, to determine 
where 9 = (1, 2, 3) (4, 7) and a = (5, 6, 7) (3, 4, 2), then, since 9:5 -> 5, 
6 —»■ 6, 7 —> 4, 3 —> 1, 4 —> 7, 2 —> 3, 9~ 1 a9 is obtained from a by re¬ 
placing in a, 5 by 5, 6 by 6, 7 by 4, 3 by 1, 4 by 7, and 2 by 3, so that 
9^g9 = (5,6, 4)(1, 7, 3). , 

With this algorithm for computing conjugates it becomes clear that two 
permutations having the same cycle decomposition are conjugate. For if 
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( fl lJ fl 23 ■ • • ! a m)( b U b 2> • • • 3 b n 2 ) 

, <X ni )(Pl, 02> • • • 3 P„ 2 ) • • • (Xl> X23 
jie could use as 6 the permutation 


' (* l3 X 2 ,..., x„ r ) and % = (a l5 a 2 , 
,Xn )3 then t = 6~ 1 a6, where 


• • • *1 • • • 

• ■ • Xi • • • X„ r / 


us, for instance, (1, 2) (3, 4, 5) (6, 7, 8) and (7, 5) (1, 3, 6) (2, 4, 8) can be 
ibited as conjugates by using the conjugating permutation 


That two conjugates have the same cycle decomposition is now trivial 
r, by our rule, to compute a conjugate, replace every element in a given 
cle by its image under the conjugating permutation. 

We restate the result proved in the previous discussion as 

EMMA 2.11.3 The number of conjugate classes in S n is pin ), the number of 
rtitions of n. 

Since we have such an explicit description of the conjugate classes in 
we can find all the elements commuting with a given permutation. We 
lustrate this with a very special and simple case. 

Given the permutation (1,2) in S n , what elements commute with it? 
ertainly any permutation leaving both 1 and 2 fixed does. There are 
n — 2)! such. Also (1,2) commutes with itself. This way we get 2 (n — 2)! 
ements in the group generated by (1, 2) and the (n — 2)! permutations 
ving 1 and 2 fixed. Are there others? There are n(n — l)/2 trans- 
sitions and these are precisely all the conjugates of (1, 2). Thus the con- 
gate class of (1, 2) has in it n{n — l)/2 elements. If the order of the 
ormalizer of (1, 2) is r, then, by our counting principle, 

n(n — 1) _ o(S n ) _ n\ 

2 r r' 


t 


hus r = 2(n — 2)!. That is, the order of the normalizer of (1, 2) is 
(n — 2)!. But we exhibited 2 (n — 2)! elements which commute with 
1, 2); thus the general element a commuting with (1, 2) is a = (1, 2)‘t, 
here i = 0 or 1, t is a permutation leaving both 1 and 2 fixed. 

As another application consider the permutation (1, 2, 3, . . . , n) e S n . 
e claim this element commutes only with its powers. Certainly it does 
ommute with all its powers, and this gives rise to n elements. Now, any 
-cycle is conjugate to (1,2, ...,«) and there are (n — 1)! distinct 
-cycles in S n . Thus if u denotes the order of the normalizer of (1, 2, . . ., n) 
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in S n , since o(S n )ju = number of conjugates of (1,2, ...,«) in S — 
(»-!)!> 

n\ 

u = - = n. 

{n - 1)! 

So the order of the normalizer of (1, 2, . . ., n) in S„ is n. The powers of 
(1,2having given us n such elements, there is no room left for 
others and we have proved our contention. 

Problems 

1. List all the conjugate classes in S 3 , find the c a ’s, and verify the class 
equation. 

2. List all the conjugate classes in S 4 , find the c a ’s and verify the class 
equation. 

3. List all the conjugate classes in the group of quaternion units (see 
Problem 21, Section 2.10), find the c a ’s and verify the class equation. 

4. List all the conjugate classes in the dihedral group of order 2 n, find 
the c a ’s and verify the class equation. Notice how the answer depends 
on the parity of n. 

1 n' 

5. (a) In S n prove that there are---distinct r cycles. 

r (n — r )! 

(b) Using this, find the number of conjugates that the r-cycle 
(1, 2,.. ., r) has in S n . 

(c) Prove that any element a in S n which commutes with (1,2 , ,r) 

is of the form a = (1, 2,..., r)‘ t, where i = 0, 1, 2,. .., r, x 
is a permutation leaving all of 1 , 2, . . . , r fixed. 

6. (a) Find the number of conjugates of (1, 2) (3, 4) in S„, n > 4. 

(b) Find the form of all elements commuting with (1, 2) (3, 4) in S n . 

7. If p is a prime number, show that in S p there are (p — 1)! + 1 
elements x satisfying x p = e. 

8. If in a finite group G an element a has exactly two conjugates, prove 
that G has a normal subgroup N # ( e ), G. 

9. (a) Find two elements in A s , the alternating group of degree 5, which 

are conjugate in S 5 but not in A s . 

(b) Find all the conjugate classes in A s and the number of elements 
in each conjugate class. 

10. (a) If TV is a normal subgroup of G and a e N, show that every con¬ 
jugate of a in G is also in TV. 

(b) Prove that o(TV) = ^ c a for some choices of a in TV. 


* 
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(c) Using this and the result for Problem 9(b), prove that in A 5 there 
is no normal subgroup N other than (e) and A 5 . 

f |1 Using Theorem 2.11.2 as a tool, prove that if o(G) = p n , p a prime 
number, then G has a subgroup of order p a for all 0 < a < n. 


12. If o(G) — p n , p a prime number, prove that there exist subgroups 

JV-, i = 0, 1, . . . , r (for some r) such that G = N 0 N t =) N 2 * • • 

^ N r — (e) where N t is a normal subgroup of N i _ l and where 
N i _ l /N i is abelian. 

13. If o{G) = p n , p a prime number, and H ^ G is a subgroup of G, 

show that there exists an x e G, x £ H such that x _ 1 Hx = H. 

14. Prove that any subgroup of order p n ~ x in a group G of order p n , 
p a prime number, is normal in G. 

*15. If o(G) = p n , p a prime number, and if N ^ (e) is a normal subgroup 
of G, prove that N n Z ^ (e ), where Z is the center of G. 


16. If G is a group, Z its center, and if GfZ is cyclic, prove that G must 
be abelian. 

17. Prove that any group of order 15 is cyclic. 

18. Prove that a group of order 28 has a normal subgroup of order 7. 

t 19. Prove that if a group G of order 28 has a normal subgroup of order 4, 
then G is abelian. 

2.12 Sylow's Theorem 

Lagrange’s theorem tells us that the order of a subgroup of a finite groyp is 
a divisor of the order of that group. The converse, however, is false. There 
are very few theorems which assert the existence of subgroups of prescribed 
order in arbitrary finite groups. The most basic, and widely used, is a 
classic theorem due to the Norwegian mathematician Sylow. 

We present here three proofs of this result of Sylow. The first is a very 
elegant and elementary argument due to Wielandt. It appeared in the 
,1 journal Archiv der Matematik, Vol. 10 (1959), pages 401-402. The basic 
4 elements in Wielandt’s proof are number-theoretic and combinatorial. It 
has the advantage, aside from its elegance and simplicity, of producing the 

I ! subgroup we are seeking. The second proof is based on an exploitation of 
Ifhiduction in an interplay with the class equation. It is one of the standard 
^Classical proofs, and is a nice illustration of combining many of the ideals 
I developed so far in the text to derive this very important cornerstone due to 
I Sylow. The third proof is of a completely different philosophy. The basic 
' idea there is to show that if a larger group than the one we are considering 
satisfies the conclusion of Sylow’s theorem, then our group also must. 
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This forces us to prove Sylow’s theorem for a special family of groups—the 
symmetric groups. By invoking Cayley’s theorem (Theorem 2.9.1) we are 
then able to deduce Sylow’s theorem for all finite groups. Apart from this 
strange approach—to prove something for a given group, first prove it for a 
much larger one—this third proof has its own advantages. Exploiting the 
ideas used, we easily derive the so-called second and third parts of Sylow’s 
theorem. 

One might wonder: why give three proofs of the same result when, clearly, 
one suffices? The answer is simple. Sylow’s theorem is that important that 
it merits this multifront approach. Add to this the completely diverse 
nature of the three proofs and the nice application each gives of different 
things that we have learned, the justification for the whole affair becomes 
persuasive (at least to the author). Be that as it may, we state Sylow’s 
theorem and get on with Wielandt’s proof. 


THEOREM 2.12.1 (Sylow) If p is a prime number and p a \ o(G), then 
G has a subgroup of order p a . 


Before entering the first proof of the theorem we digress slightly to a 
brief number-theoretic and combinatorial discussion. 

The number of ways of picking a subset of k elements from a set of n 
elements can easily be shown to be 



If n — p a m where p is a prime number, and if p r | m but p r+ 1 f m, consider 
/ p a m\ _ {p a ni )! 

\P a ) ~ (P*) ! {P* m — p a )! 


p a m(p a m — 1) • • • ( p a m — i) • • • (p a m — p a + 1) 

P a (p a ~ 1 ) ‘ ‘ (P a ~ i) • * • (P“ ~ P a + 1 ) 


The question is, What power of p divides 



Looking at this number, 


written out as we have written it out, one can see that except for the term 
m in the numerator, the power of p dividing (p a m — i) is the same as that 
dividing p a — i, so all powers of p cancel out except the power which 
divides m. Thus 


r i 

\p\ 


but ruf: 

\p . 
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First Proof of the Theorem. Let 
Ugyc p a elements. Thus Ji has 


elements. Given M x , M 2 e 


be the set of all subsets of G which 

f p a rf 

/ 

' : MM i s a su bset of G having p a elements, and likewise so is M 2 ) define 
'jpfi M 2 if there exists an element g e G such that M x = M 2 g. It is 

. fan m ediate to verify that this defines an equivalence relation on Ji. We 
ip foim that there is at least one equivalence class of elements in Ji such that 
the number of elements in this class is not a multiple of p r+1 , for if p r+ 1 is 
divisor of the size of each equivalence class, then p r+ 1 would be a divisor 


f 

of the number of elements in 
(p a m 


Since Ji has 


elements and 


f +1 X 


this cannot be the case. Let {M x , 


p a nP 

. , M n } be such an 


equivalence class in Ji where p r+ 1 X n - By our very definition of equivalence 
in Ji , if g e G, for each i = 1 n, M t g — Mj for some j, 1 < j < n. 
We let H = {g e G | M x g — M x }. Clearly H is a subgroup of G, for if 
a, b e H, then M x a — M x , M x b = M x whence M x ab = ( M x a)b = M x b = 
M,. We shall be vitally concerned with o(H). We claim that no(H) — 
®(G)\ we leave the proof to the reader, but suggest the argument used in 
the counting principle in Section 2.11. Now no(H) — o(G) = p a m; since 
pi* 1 X n an d p a+r \p a 'm = no(H), it must follow that p a \o(H), and so 
o(H) > p a . However, if m x e M x , then for all h e H, m x h e M v Thus 
M t has at least o(H) distinct elements. However, M x was a subset of G 
txmtaining p a elements. Thus p a > o(H). Combined with o(H) > p a we 
have that o(H) = p a . But then we have exhibited a subgroup of G having exactly 
p 1 elements, namely H. This proves the theorem; it actually has done more— 
it has constructed the required subgroup before our very eyes! 

What is usually known as Sylow’s theorem is a special case of Theorem 
2.12.1, namely that 

COROLLARY If p m \ o(G), p m+ 1 X o(G), then G has a subgroup of order p m . 

A subgroup of G of order p m , where p m | o(G) but p m+ 1 X o(G), is called a 
P-Sylow subgroup of G. The corollary above asserts that a finite group has 
\P~Sylow subgroups for every prime p dividing its order. Of course the 
Conjugate of a />-Sylow subgroup is a />-Sylow subgroup. In a short while 
f Sve shall see how any two />-Sylow subgroups of G —for the same prime p — 
are related. We shall also get some information on how many />-Sylow 
Subgroups there are in G for a given prime/?. Before passing to this, we want 
To give two other proofs of Sylow’s theorem. 

We begin with a remark. As we observed just prior to the corollary, 
jjpne corollary is a special case of the theorem. However, we claim that the 
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theorem is easily derivable from the corollary. That is, if we know that G 
possesses a subgroup of order p m , where p m \ o(G) but p m+1 X °{G), then 
we know that G has a subgroup of order p a for any a such that p*\o{G). 
This follows from the result of Problem 11, Section 2.11. This result states 
that any group of order p m , p a prime, has subgroups of order p a for any 
0 < a < m. Thus to prove Theorem 2.12.1—as we shall proceed to do, 
again, in two more ways—it is enough for us to prove the existence of 
/>-Sylow subgroups of G, for every prime p dividing the order of G. 


Second Proof of Sy/ow's Theorem. We prove, by induction on the order 
of the group G, that for every prime p dividing the order of G, G has a 
/>-Sylow subgroup. 

If the order of the group is 2, the only relevant prime is 2 and the group 
certainly has a subgroup of order 2 , namely itself. 

So we suppose the result to be correct for all groups of order less than 
0 (G). From this we want to show that the result is valid for G. Suppose, 
then, that p m \ o(G), p m+ 1 X o(G), where p is a prime, m > 1. If J o(H) 
for any subgroup H of G, where H ^ G, then by the induction hypothesis, 
H would have a subgroup T of order p m . However, since T is a subgroup 
of H, and H is a subgroup of G, T too is a subgroup of G. But then T would 
be the sought-after subgroup of order p m . 

We therefore may assume that p m X °(H) for any subgroup H of G, where 
H 7 ^ G. We restrict our attention to a limited set of such subgroups. 
Recall that if a e G then N(a) = {x e G \ xa = ax) is a subgroup of G; 
moreover, if a £ Z, the center of G, then N(a) ^ G. Recall, too, that the 
class equation of G states that 


« G ) = E 


o(G) 

o(N(a))’ 


where this sum runs over one element a from each conjugate class. We 
separate this sum into two pieces: those a which lie in Z, and those which 
don’t. This gives 


o(G) — z + 


^ o(G) 

fl 4z o(N(a))’ 


where z = o(Z). Now invoke the reduction we have made, namely, that 
p m X °(H) for any subgroup H G of G, to those subgroups N(a) for a $ Z- 
Since in this case, p m \ o(G) and p m X °(N(a)) y we must have that 


o(G) 

o(N(a)) 


Restating this result, 


o(G) 
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for every a E G where a £ Z. Look at the class equation with this information 
. hand. Since p m | o(G), we have that p | o(G ); also 

°(P) 

a it o(N(a)) 

•Thus the class equation gives us that jfr | z. Since p | z = o(Z), by Cauchy’s 
theorem (Theorem 2.11.3), Z has an element b ^ e of order p. Let 
B = (P), the subgroup of G generated by b. B is of order j6; moreover, 
since b E Z, B must be normal in G. Hence we can form the quotient group 
Q —: G/B. We look at G. First of all, its order is o(G)/o(B) = o(G)/p, 
hence is certainly less than o(G). Secondly, we have p m ~ x | o(G), but 
if* X °(p). Thus, by the induction hypothesis, G has a subgroup P of order 
a* - 1 . Let P = {x e G\xBeP}; by Lemma 2.7.5, P is a subgroup of 
G. Moreover, P « PjB (Prove!); thus 


r~ x = o(P) = 


o(P) 


o(B) 


This results in o(P) = p m . Therefore P is the required jfr-Sylow subgroup of 
G. This completes the induction and so proves the theorem. 

With this we have finished the second proof of Sylow’s theorem. Note 
that this second proof can easily be adapted to prove that if p a | o(G), then 
G has a subgroup of order p a directly, without first passing to the existence 
of a j5-Sylow subgroup. (This is Problem 1 of the problems at the end of 
this section.) 

We now proceed to the third proof of Sylow’s theorem. 

Third Proof of Sylow's Theorem. Before going into the details of the 
proof proper, we outline its basic strategy. We will first show that the 
symmetric groups S pr , p a prime, all have jfr-Sylow subgroups. The next 
Step will be to show that if G is contained in M and M has a jfr-Sylow sub- 
; group, then G has a />-Sylow subgroup. Finally we will show, via Cayley’s 
theorem, that we can use S pk , for large enough k, as our M. With this we 
[..Will have all the pieces, and the theorem will drop out. 

In carrying out this program in detail, we will have to know how large 
& />-Sylow subgroup of S pr should be. This will necessitate knowing what 
fljpower of p divides ( p r )!. This will be easy. To produce the />-Sylow sub- 
J | group of S pr will be harder. To carry out another vital step in this rough 
Ihetch, it will be necessary to introduce a new equivalence relation in groups, 
Ptid the corresponding equivalence classes known as double cosets. This 
g^tll have several payoffs, not only in pushing through the proof of Sylow’s 
eorem, but also in getting us the second and third parts of the full Sylow 
eorem. 
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So we get down to our first task, that of finding what power of a prime 
p exactly divides (/)!. Actually, it is quite easy to do this for n\ for any 
integer n (see Problem 2). But, for our purposes, it will be clearer and will 
suffice to do it only for (/)!. 

Let n(k) be defined by | (/)! but / (k) + 1 X (/) *• 

LEMMA 2.12.1 n(k) = 1 + p +■••+/" 1 . 

Proof. If k = 1 then, since p\ = 1 • 2 ••■(/> — 1) 'p, it is clear that 
p\p\ but p 2 X P- Hence n(l) = 1, as it should be. 

What terms in the expansion of (/)! can contribute to powers of p 
dividing ( p k )!? Clearly, only the multiples of p\ that is, p , 2 p, ... ,p k l p. 
In other words n(k) must be the power of p which divides 

p{2p)(5p) ■ • • {p k ~ 1 P) = /' c-1 (/ -1 )!. But then n ( k ) + n ( k ~ *)• 

Similarly, n(k — 1) = n[k — 2) + p k 2 , and so on. Write these out as 

n{k) — n(k — 1) = p k 1 , 
n(k - 1) - n(k - 2) = p k ~ 2 , 

n{2 ) - n{\) = p, 
n(l) = 1. 

Adding these up, with the cross-cancellation that we get, we obtain 
n (k) = 1 4- p + p 2 + • • • 4- p k ~ l . This is what was claimed in the lemma, 
so we are done. 

We are now ready to show that S pk has a />-Sylow subgroup; that is, we 
shall show (in fact, produce) a subgroup of order p n{k) in S pk . 

LEMMA 2.12.2 S pk has ap-Sylow subgroup. 

Proof. We go by induction on A;. If A; = 1, then the element (1 2 ... p), 
in S p is of order p , so generated a subgroup of order p. Since n(l) = 1, 
the result certainly checks out for k — 1. 

Suppose that the result is correct for k — 1; we want to show that it 
then must follow for k. Divide the integers 1,2,...,/ into p clumps, 
each with p k ~ 1 elements as follows: 

(1, 2, ...,/" 1 }, {/ -1 + l,/" 1 + 2,..., 2/ -1 },..., 

. {(/> - 1 )/ -1 + 1, •••>/}• 

The permutation a defined by a = (1, / 1 + 1 5 2 p k 1 + 1, • • ■ ? 

{p - 1)/" 1 + 1) • • • [j,p k ~ 1 +j , 2 p k ~ 1 +j, • • •, (P ~ 1)/" 1 + 1 +J)'" 

(p k ~ 1 , 2p k ~ 1 ,. . ., (p - l)p k ~ 1 , /) has the following properties: 

1 . o p = e. 
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2. If t is a permutation that leaves all i fixed for i > p k ~ 1 (hence, affects 
only 1, 2, . . ., p k ~ 1 ), then o~ 1 t a moves only elements in {p k 1 + 1, 
p k 1 + 2, ..., 2 p k 1 }, and more generally, o~ J 'to-' moves only elements 
in {jp k ~ 1 + 1, jp k ~ 1 + 2 ,..., {j + 1 )p k ~ 1 }. 

Consider A = {x £ S pk | r(i) = i if i > p k ~ 1 }. A is a subgroup of S pk 
and elements in A can carry out any permutation on 1, 2,. . ., p k ~ 1 . 
From this it follows easily that A x S pk ~ 1 . By induction, A has a subgroup 
P l of order /> n(k_ 1) . 

Let T = (o~ 1 P i <x) (<r ~ 2 P\ 0 2 ) ■ ■ ■ x 'P x a p ~ ') = P X P 2 - 'P n _ u 

where P t = a 'P^ 1 . Each P. is isomorphic to P l so has order p n(k ~ l) . 
Also elements in distinct P ?s influence nonoverlapping sets of integers, 
hence commute. Thus T is a subgroup of S pk . What is its order? Since 
P t n Pj = (e) if 0 < i ^ j < p — 1 , we see that o(T) — o(P 1 ) p = p pn ^ k ~ 1) . 
We are not quite there yet. T is not the p -Sylow subgroup we seek! 

Since o p = e and = P ( we have a~ y Ta — T. Let P = 

{ah \ t e T, 0 < j < p — 1}. Since a $ T and a~ 1 To = T we have two 
things: firstly, T is a subgroup of S pk and, furthermore, o(P) = p ■ o{T) = 
p .pnth i)p _ pn(k i)p+i ]\j ow we are fi na liy there! P is the sought-after 
/>-Sylow subgroup of S pk . 

Why? Well, what is its order? It is p n ^ k ~ l ^A 1 _ But n(k — 1 ) = 
1 +/» + ••*+ p k ~~ 2 , hence pn(k - 1 ) + 1 = 1 + p + * • • + p k ~ 1 = n {k). 
Since now o(P ) = P is indeed a />-Sylow subgroup of S pk . 

Note something about the proof. Not only does it prove the lemma, it 
actually allows us to construct the />-Sylow subgroup inductively. We 
follow the procedure of the proof to construct a 2-Sylow subgroup in £ 4 . 

Divide 1, 2, 3, 4 into {1, 2} and {3, 4}. Let P t - ((1 2)) and < 7 "== 
(1 3)(2 4). Then P 2 = a ^P^o = (3 4). Our 2-Sylow subgroup is then 
the group generated by (1 3) (2 4) and 

T = P X P 2 = {(1 2 ), (3 4),(1 2 )(3 4), e). 

In order to carry out the program of the third proof that we outlined, we 
now introduce a new equivalence relation in groups (see Problem 39, 
Section 2.5). 

DEFINITION Let G be a group, A, B subgroups of G. If x, y e G define 
x ~ y if y — axb for some a e A, b e B. 

We leave to the reader the verification—it is easy—of 

LEMMA 2.12.3 The relation defined above is an equivalence relation on G. 
The equivalence class of x e G is the set AxB = {axb | a e A, b e B). 
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We call the set AxB a double coset of A, B in G. 

If A, B are finite subgroups of G, how many elements are there in the 
double coset AxB ? To begin with, the mapping T'.AxB —> AxBx given 
by (axb)T = axbx~ 1 is one-to-one and onto (verify). Thus o(AxB) = 
o(AxBx~ 1 ). Since xBx~ 1 is a subgroup of G, of order o(B), by Theorem 2.5.1, 


o(AxB) = o(AxBx *) 
We summarize this in 


o(A)o(xBx 1 ) _ o(A)o(B) 
o(A n xBx~ x ) o{A n xBx -1 ) 


LEMMA 2.12.4 If A, B are finite subgroups of G then 
oUxB) . 

>(in ‘j 

We now come to the gut step in this third proof of Sylow s theorem. 


LEMMA 2.12.5 Let G be a finite group and suppose that G is a subgroup of the 
finite group M. Suppose further that M has a p-Sylow subgroup Q. Then G has a 
p-Sylow subgroup P. In fact, P = G n xQx~ 1 for some x e M. 


Proof. Before starting the details of the proof, we translate the hypoth¬ 
eses somewhat. Suppose that p m \o{M), p m+1 Jfo{M), Q is a subgroup 
of M of order p m . Let o(G) = p n t where p X t. We want to produce a sub¬ 
group P in G of order p n . 

Consider the double coset decomposition of M given by G and Q; 
M = (J GxQ. By Lemma 2.12.4, 


oiGxQ ) 


o(G)o(Q) = p n tp m 
o{G n xQx~ 1 ) o(G n xQx 1 ) 


Since G n xQx~ 1 is a subgroup of xQx \ its order is p mx . We claim that 
m x — n for some x e M. If not, then 

„ rr'..r\\ P _ th m + n-m x 

0 {GxQ) = - tp 

so is divisible by p m+1 . Now, since M = U GxQ, and this is disjoint union, 
o(M) = X °(GxQ), the sum running over one element from each double 
coset. But p m+ 1 \o{GxQ) ; hence p m+ 1 1 o{M). This contradicts p m+ 1 fo{M). 
Thus m x = n for some x e M. But then o{G n xQx 1 ) = p n ■ Since 
G n xQ x~ 1 = P is a subgroup of G and has order p n , the lemma is proved. 

We now can easily prove Sylow’s theorem. By Cayley’s theorem 
(Theorem 2.9.1) we can isomorphically embed our finite group G in S n , 
the symmetric group of degree n. Pick k so that n < p k ; then we can iso¬ 
morphically embed S n in S pk (by acting on 1, 2,. . . 3 n only in the set 
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1,2 , . . . , n, . . . , p k ), hence G is isomorphically embedded in S pk . By 
Lemma 2.12.2, S pk has a />-Sylow subgroup. Hence, by Lemma 2.12.5, 
G must have a />-Sylow subgroup. This finishes the third proof of Sylow’s 
theorem. 

This third proof has given us quite a bit more. From it we have the 
machinery to get the other parts of Sylow’s theorem. 


THEOREM 2.12.2 (Second Part of Sylow’s Theorem) If G is a finite 
group, p a prime and p n | o(G) but p n+ 1 f o(G), then any two subgroups of G of 
order p n are conjugate. 


Proof. Let A, B be subgroups of G, each of order p n . We want to show 
that A = gBg~ 1 for some g e G. 

Decompose G into double cosets of A and B; G = U AxB. Now, by 
Lemma 2.12.4, 

o(AxB) = ——. 

o(A n xBx *) 

If A ^ xBx~ x for every x e G then o(A n xBx~ 1 ) = p m where m < n. 
Thus 


o(AxB) = 


0W0{B) = = 2n- m 

p m p m 


and 2 n — m > n + 1. Since p n+x | o(AxB) for every x and since o(G) = 
X) o(AxB), we would get the contradiction p n+1 | o(G). Thus A = gBg~ 1 
for some g e G. This is the assertion of the theorem. t>. 


Knowing that for a given prime p all />-Sylow subgroups of G are conjugate 
allows us to count up precisely how many such />-Sylow subgroups there 
are in G. The argument is exactly as that given in proving Theorem 2.11.1. 
In some earlier problems (see, in particular, Problem 16, Section 2.5) we 
discussed the normalizer N(H), of a subgroup, defined by N(H) = 
{xeG\xHx~ x = H}. Then, as in the proof of Theorem 2.11.1, we have 
that the number of distinct conjugates, xHx~ x , of H in G is the index of N(H) in G. 
Since all />-Sylow subgroups are conjugate we have 


LEMMA 2.12.6 The number of p-Sylow subgroups in G equals o(G)jo(N(P)), 
where P is any p-Sylow subgroup of G. In particular, this number is a divisor of o(G). 

However, much more can be said about the number of />-Sylow subgroups 
there are, for a given prime p, in G. We go into this now. The technique 
will involve double cosets again. 
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THEOREM 2.12.3 (Third Part of Sylow’s Theorem) The number oj 
p-Sylow subgroups in G, for a given prime , is of the form 1 + kp. 

Proof. Let P be a />-Sylow subgroup of G. We decompose G into double 
cosets of P and P. Thus G = (J PxP. We now ask: How many elements 
are there in PxP? By Lemma 2.12.4 we know the answer: 


o(PxP) 


o(P ) 2 

o(P n xPx~ 1 ) 


Thus, if P n xPx~ 1 ^ P then p n+1 | o(PxP), where p n = o(P). Para¬ 
phrasing this: if x ^ N(P) then p n + 1 \ o(PxP). Also, if x e N(P), then PxP = 
P(Px) = P 2 x = Px, so o(PxP) = p n in this case. 

Now 


o(G) = Y. °( PxP ) + E °( pxp )> 

xeN(P) xtN(P) 


where each sum runs over one element from each double coset. However, 
if x e N(P), since PxP = Px, the first sum is merely ^xeN(P) o(Px) over 
the distinct cosets of P in N(P ). Thus this first sum is just o(N(P)). What 
about the second sum? We saw that each of its constituent terms is divisible 
by p n+l , hence 

y <>( PxP )• 

x#iV(P) 

We can thus write this second sum as 


2 o{PxP) = p n+1 u. 

x$N(P) 

Therefore o(G) = o(N(P)) + p n+l u, so 

o(G) = j P n+l u 
o(N(P)) o(N(P))' 


Now o(N(P)) | o(G) since N(P) is a subgroup of G, hence p n+ 1 u/o(N(P)) 
is an integer. Also, since p n+ 1 f o(G), p n+1 can’t divide o(N(P)). But then 
p n + 1 u)o{N{P)) must be divisible by p, so we can write p n + 1 ujo(N(P)) as kp, 
where k is an integer. Feeding this information back into our equation 
above, we have 


o(G) 

°(N(P)) 


1 + kp. 


Recalling that o{G)jo{N(P)) is the number of />-Sylow subgroups in G, 
we have the theorem. 


In Problems 20-24 in the Supplementary Problems at the end of this 
chapter, there is outlined another approach to proving the second and third 
parts of Sylow’s theorem. 

We close this section by demonstrating how the various parts of Sylow’s 
theorem can be used to gain a great deal of information about finite groups- 
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Let G be a group of order Il 2 13 2 . We wan. to determine how many 
11-Sylow subgroups and how many 13-Sylow subgroups there are in C 
Ihe number of 11-Sylow subgroups, by Theorem 2.12.13, is of the form 
1 + 11*. By Lemma 2.12.5, this must divide 11 2 -13 2 ; being prime to 11 
it must divide 13 . Can 13 2 have a factor of the form 1 + 1 1 *? Clearlv no" 
other than 1 itself. Thus 1 + 11* = 1 , and so there must be only one 11 - 
Sylow subgroup in C. Since all 11-Sylow subgroups are conjugate (Theorem 
2.12.2) we conclude that the 1 1-Sylow subgroup is normal in G. 

What about the 13-Sylow subgroups? Their number is of the form 
1 + 13* and must divide 11 2 • 13 2 , hence must divide ll 2 . Here too we 
conclude that there can be only one 13-Sylow subgroup in C, and it must 
be normal. 

We now know that G has a normal subgroup A of order 11 2 and a normal 
subgroup £ of order 13 2 . By the corollary to Theorem 2.11.2, any group 
o order p is abelian; hence A and B are both abelian. Since A n B = (e) 
we easily get AB = G. Finally, if a e A, b e B, then aba~ ' = 
a{ba b ) eA since A is normal, and aba l b~ 1 = ( aba~ y )b~ 1 e B since 
B is normal. Thus aba~ y b~ l eAnB = («). This gives us aba~ = e 
and so ab - ba for a e A, b e B. This, together with AB = G, A, B abelian' 
allows us to conclude that G is abelian. Hence any group of order 11 2 • 13 2 
must be abelian. 

We give one other illustration of the use of the various parts of Sylow’s 
theorem. Let G be a group of order 72; o{G) = 2 3 3 2 . How many 3-Sylow 
subgroups can there be in G? If this number is t, then, according to Theorem 
2.12.3, t = 1 + 3k. According to Lemma 2.12.5, / | 72, and since t is 
prime to 3, we must have t | 8 . The only factors of 8 of the form 1 + 3k 
are 1 and 4; hence t = 1 or t = 4 are the only possibilities. In other words 
tr has either one 3-Sylow subgroup or 4 such. 

If G has only one 3-Sylow subgroup, since all 3-Sylow subgroups are 
conjugate, this 3-Sylow subgroup must be normal in G. In this case G 
Would certainly contain a nontrivial normal subgroup. On the other hand 
the number of 3-Sylow subgroups of G is 4, by Lemma 2.12.5 the index of 
■JV m G is 4, where N is the normalizer of a 3-Sylow subgroup. But 72 X 4 1 = 
(W)!. By Lemma 2.9.1 N must contain a nontrivial normal subgroup of 
(of order at least 3). Thus here again we can conclude that G contains a 
nontrivial normal subgroup. The upshot of the discussion is that any group 

order 72 must have a nontrivial normal subgroup, hence cannot be 

simple. 


Problems 


Adapt the second proof given of Sylow’s theorem to prove directly 
that if p is a prime and p« | o(G), then G has a subgroup of order p a . 
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2. If * > 0 is a real number, define [ x ] to be m, where m is that integer 
such that m < x < m + 1. If p is a prime, show that the power of 
p which exactly divides n\ is given by 



3. Use the method for constructing the />-Sylow subgroup of Spk to find 
generators for 

(a) a 2-Sylow subgroup in S 8 . (b) a 3-Sylow subgroup in S g . 

4. Adopt the method used in Problem 3 to find generators for 

(a) a 2-Sylow subgroup of S 6 . (b) a 3-Sylow subgroup of S 6 . 

5. If p is a prime number, give explicit generators for a />-Sylow sub¬ 
group Of S p 2 . 

6. Discuss the number and nature of the 3-Sylow subgroups and 5- 
Sylow subgroups of a group of order 3 2 *5 2 . 

7. Let G be a group of order 30. 

(a) Show that a 3-Sylow subgroup or a 5-Sylow subgroup of G 
must be normal in G. 

(b) From part (a) show that every 3-Sylow subgroup and every 
5-Sylow subgroup of G must be normal in G. 

(c) Show that G has a normal subgroup of order 15. 

(d) From part (c) classify all groups of order 30. 

(e) How many different nonisomorphic groups of order 30 are there? 

8. If G is a group of order 231, prove that the 11-Sylow subgroup is in 
the center of G. 


9. If G is a group of order 385 show that its 11-Sylow subgroup is normal 
and its 7-Sylow subgroup is in the center of G. 

10. If G is of order 108 show that G has a normal subgroup of order 3 k , 
where k > 2. 

11. If o(G) = pq, p and q distinct primes, p < q, show 
(a) if p X {q ~ l), then G is cyclic. 

*(b) if p | (q — 1), then there exists a unique non-abelian group of 
order pq. 

*12. Let G be a group of order pqr, p < q < r primes. Prove 

(a) the r-Sylow subgroup is normal in G. 

(b) G has a normal subgroup of order qr. 

(c) if q X (r — 1), the ^-Sylow subgroup of G is normal in G. 

13. If G is of order p 2 q, p, q primes, prove that G has a nontrivial nor¬ 
mal subgroup. 
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*14. If G is of order p 2 q, p, q primes, prove that either a />-Sylow sub¬ 
group or a <?-Sylow subgroup of G must be normal in G. 

15. Let G be a finite group in which ( ab) p = a p b p for every a, b e G, 
where p is a prime dividing o(G). Prove 
(a) The />-Sylow subgroup of G is normal in G. 

*(b) If P is the />-Sylow subgroup of G, then there exists a normal 
subgroup N of G with P n N = (e) and PN = G. 

(c) G has a nontrivial center. 

**16. If G is a finite group and its j&-Sylow subgroup P lies in the center of 
G, prove that there exists a normal subgroup N of G with P n N = 
(e) and PN = G. 

*17. If H is a subgroup of G, recall that N(H) = {x e G | xHx~ 1 = H }. 
If P is a />-Sylow subgroup of G, prove that N(N(P)) = N(P). 

*18. Let P be a />-Sylow subgroup of G and suppose a, b are in the center 
of P. Suppose further that a = xbx~ 1 for some x e G. Prove that 
there exists a y e N(P) such that a = yby~ *. 

**19. Let G be a finite group and suppose that 0 is an automorphism of G 
such that 0 3 is the identity automorphism. Suppose further that 
4>(x) — x implies that x = e. Prove that for every prime p which 
divides o{G), the/>-Sylow subgroup is normal in G. 

#20. Let G be the group of n x n matrices over the integers modulo p, 
p a prime, which are invertible. Find a />-Sylow subgroup of G. 

21. Find the possible number of 11-Sylow subgroups, 7-Sylow subgroups, 
and 5-Sylow subgroups in a group of order 5 2 • 7 • 11. 

22. If G is S 3 and A = ((12)) in G, find all the double cosets AxA of 
A in G. 

23. If G is »S 4 and A = ((1 2 3 4)), B = ((1 2)), find all the double 
cosets AxB of A, B in G. 

24. If G is the dihedral group of order 18 generated by a 2 = b 9 = e, 
ab = b 1 a, find the double cosets for H , K in G, where H = ( a ) 
and K = (b 3 ). 

2.13 Direct Products 

On several occasions in this chapter we have had a need for constructing a 
new group from some groups we already had on hand. For instance, 
towards the end of Section 2.8, we built up a new group using a given group 
and one of its automorphisms. A special case of this type of construction 
has been seen earlier in the recurring example of the dihedral group. 

However, no attempt had been made for some systematic device for 
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constructing new groups from old. We shall do so now. The method re¬ 
presents the most simple-minded, straightforward way of combining groups 
to get other groups. 

We first do it for two groups—not that two is sacrosanct. However, 
with this experience behind us, we shall be able to handle the case of any 
finite number easily and with dispatch. Not that any finite number is 
sacrosanct either; we could equally well carry out the discussion in the 
wider setting of any number of groups. However, we shall have no need for 
so general a situation here, so we settle for the case of any finite number of 
groups as our ultimate goal. 

Let A and B be any two groups and consider the Cartesian product 
(which we discussed in Chapter 1) G = A x 5 of J and B. G consists 
of all ordered pairs (a, b ), where a e A and b e B. Can we use the operations 
in A and B to endow G with a product in such a way that G is a group? 
Why not try the obvious? Multiply componentwise. That is, let us define, 
for (a l5 b^) and (a 2 , b 2 ) in G, their product via (a u b l )(a 2 , b 2 ) = {a l a 2 , ^ 1 ^ 2 )• 
Here, the product a l a 2 in the first component is the product of the elements 
a l and a 2 as calculated in the group A. The product b t b 2 in the second 
component is that of b l and b 2 as elements in the group B. 

With this definition we at least have a product defined in G. Is G a 
group relative to this product? The answer is yes, and is easy to verify. 
We do so now. 

First we do the associative law. Let (a l} b 1 ), ( a 2 , b 2 ), and (a 3 , b 3 ) be 
three elements of G. Then ((a 1 , b l )(a 2 , b 2 ))(a 3 , b 3 ) = {a^a^ b l b 2 )(a 3 , b 3 ) — 
(( fl i« 2 )« 3 > (^1 ^2)^3) 5 while (a l5 bi)({a 2i b 2 ){a 3 , b 3 )) = (a l5 ^)(a 2 a 3 , b 2 b 3 ) = 
(a h (a 2 a 3)5 b 1 (b 2 b 3 )). The associativity of the product in A and in B then 
show us that our product in G is indeed associative. 

Now to the unit element. What would be more natural than to try 
( e,f ), where e is the unit element of A and f that of B, as the proposed 
unit element for G? We have ( a , b)(e,f) = (ae , bf) = ( a , b) and 
( e,f)(a , b) = (ea,fb) = (a, b ). Thus (e,f) acts as a unit element in G. 

Finally, we need the inverse in G for any element of G. Here, too, 
why not try the obvious? Let ( a, b) e G; try (a~ 1 , b~ 1 ) as its inverse. 
Now (a, b)(a~ b~ *) = (aa~ 1 ,bb~ 1 ) = (e,f) and (a~ 1 , b~ 1 ) (a, b) = 

( a~ 1 a, b~ l b) = (e,f), so that ( a~ 1 , b~ 1 ) does serve as the inverse for (a, b ). 

With this we have verified that G = A x B is a group. We call it the 
external direct product of A and B. 

Since G = A x B has been built up from A and B in such a trivial 
manner, we would expect that the structure of A and B would reflect heavily 
in that of G. This is indeed the case. Knowing A and B completely gives 
us complete information, structurally, about A x B. 

The construction of G = A x B has been from the outside, external. 
Now we want to turn the affair around and try to carry it out internally in G. 
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Consider A = {( aj) eG\aeA}c G = A x B, where / is the unit 
element of B. What would one expect of A? Answer: A is a subgroup of 

G and is isomorphic to A. To effect this isomorphism, define (j):A -> A 

by (f)(a ) — ( a,f ) for a e A. It is trivial that (f) is an isomorphism of A 

onto A. It is equally trivial that A is a subgroup of G. Furthermore, A is 

normal in G. For if (a,f) e A and (a u b t ) e G, then (a 1} bj (aj)(a u b x ) " 1 = 

(«i, *i )(«,/)(«! , *) = («i««i S^i/^ 1-1 ) = (fliflfl! -1 ,/) e A. Sowe 

have an isomorphic copy, A, of A in G which is a normal subgroup of G. 

What we did for A we can also do for B. It B = {( e , b) e G \ b e B }, 
then B is isomorphic to B and is a normal subgroup of G. 

We claim a little more, namely G = AB and every g e G has a unique 
decomposition in the form g = db with a e A and b e B. For, g = (a, b ) = 
b ) and = since ( aj) e A and (e, b)eB, we do have £ = ab with 

d = ( a >f) and b = { e , b )- Why is this unique? If (a, b) = xy where 

xe A and y e B, then X = (x,f), x eAandy = (e,y),yeB; thus ( fl , b) = 

W - { x ,f){e,y) = (x,y). This gives x = a and y = b, and so x = d 
andjj; = b. 

Thus we have realized G as an internal product AB of two normal sub¬ 
groups, A isomorphic to A, B to B in such a way that every element g e G 
has a unique representation in the form g = db, with d e A and b e B. 

We leave the discussion of the product of two groups and go to the case 
of n groups, n > 1 any integer. 

Let G u G 2 ,...,G n be any n groups. Let G = G 1 x G 2 x • ■ ■ x G n = 

(Ohs • • • j g n ) I gi e CJ be the set of all ordered rc-tuples, that is, the 

Cartesian product of G u G 2 , . . ., G n . We define a product in G via 

feij g 2 , • • • j gn)(gu g 2 > • • • 3 g n ) = (gigi> g 2 g 2 > • • • > g n gn)> that is, via com- 
ponentwise multiplication. The product in the *th component is carried 
in the group G t . Then G is a group in which (e u e 2 , . . ., e n ) is the unit ele¬ 
ment, where each e i is the unit element of G-, and where (p< p-, p 1 “ 1 — 

/ — l —i _ j. _ t 1 \S13 623 • • • 3 SnJ 

\i?i 3 g 2 > • • • 3 g n )• We call this group G the external direct product of 
G G G 1 

p !n X r ‘ X ” ’ X kt u‘ = Ue '’ ‘ 2 ’ ’ ’' ’ Sb ■ • • • e »)l 

gi^^iS- Anen Gj is a normal subgroup of G and is isomorphic to G t . 
Moreover, G = G 1 G 2 ■ • • G n and every g e G has a unique decomposition 

£ — gig 2 ' ' ' g n > where g 1 e G l} ■ • •, g„ e G n . We leave the verification of 
these facts to the reader. 

Here, too, as in the case A x B, we have realized the group G internally 
as the product of normal subgroups G u . . ., G n in such a way that every 
element is ^uniquely representable as a product of elements g t ■ ■ ■ g„, where 
each g { e G v With this motivation we make the 


DEFINITION 

G such that 


Let G be a group and N l} N 2 , . . ., N n normal subgroups of 
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1- G = N,N 2 ■■■N n . 

2. Given g e G then g = m 1 m 2 • ■ •m n ,m i E. N i in a unique way. 

We then say that G is the internal direct product of N l , N 2 , . . . , N n . 

Before proceeding let’s look at an example of a group G which is the 
internal direct product of some of its subgroups. Let G be a finite abelian 
group of order pff 2 a2 ' ' ' Pk* where p u p 2 , • - - , p k are distinct primes and 
each oq > 0. If P u ..., P k are the /^-Sylow subgroup, . . . , /> fc -Sylow 
subgroup respectively of G , then G is the internal direct product of 
P t , P 2 , . . . , P k (see Problem 5). 

We continue with the general discussion. Suppose that G is the internal 
direct product of the normal subgroups N t , ... , N n . The N t , ..., N n 
are groups in their own right—forget that they are normal subgroups of G 
for the moment. Thus we can form the group T = N t X N 2 x • • • X N n , 
the external direct product of N 1} ..., N n . One feels that G and T should 
be related. Our aim, in fact, is to show that G is isomorphic to T. If we 
could establish this then we could abolish the prefix external and internal 
in the phrases external direct product, internal direct product—after all 
these would be the same group up to isomorphism—and just talk about the 
direct product. 

We start with 

LEMMA 2.13.1 Suppose that G is the internal direct product of N l , ..., N n . 
Then for i j, N t n Nj = (e ), and if a e N u b e N ■ then ab = ba. 

Proof. Suppose that x e N { n Nj. Then we can write x as 

x = e^ e { _iXe i + i • • ej ■ • ■ e n , 

where e t = e, viewing x as an element in N t . Similarly, we can write x as 
x = e 1 • • • e { • • • ej_ 1 xej + l • • ■ e n , 

where e t = e, viewing x as an element of Nj. But every element—and so, 
in particular x —has a unique representation in the form m 1 m 2 • • • m„, 
where mj e N u . . ., m n e N n . Since the two decompositions in this form for 
x must coincide, the entry from N ( in each must be equal. In our 
first decomposition this entry is x, in the other it is e; hence x = e. 
Thus N t n Nj = (e) for i j. 

Suppose a e N { , b e Nj, and i ^ j. Then aba~ 1 e Nj since Nj is normal; 
thus aba~ l b~ l e Nj. Similarly, since a~ l e N t , ba~ 1 b~ 1 e N { , whence 
aba~ l b~ 1 e N { . But then aba~ 1 b~ 1 e N { n Nj — (e). Thus aba~ 1 b~ 1 — e; 
this gives the desired result ab = ba. 

One should point out that if K u . . . , K n are normal subgroups of G 
such that G = K l K 2 • • • K n and K ( n Kj = (e) for i ^ j it need not be 
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true that G is the internal direct product of K n . A more stringent 

condition is needed (see Problems 8 and 9). 

We now can prove the desired isomorphism between the external and 
internal direct products that was stated earlier. 

THEOREM 2.13.1 Let G be a group and suppose that G is the internal direct 
product of N l , N n . Let T = N l x N 2 x • • • x N n . Then G and T 
are isomorphic. 

Proof. Define the mapping i//: . G by 

K = b,b 1 •••*„, 

where each b t e N t , i = We claim that p is an isomorphism 

ol 1 onto G. 

To begin with, is certainly onto; for, since G is the internal direct 
product of N u . . ., N n , if x e G then a: = a x a 2 • • • a n for some a 1 e N x , . . . 
a e N n . But then iK( fl i, a 2 , , a n )) = ai a 2 • • • a n = x. The mapping 

\J/ is one-to-one by the uniqueness of the representation of every element as 
a product of elements from N l} . . ., N n . For, if ^((a l3 ..., a )) = 
mcu • • •, c)) where a. e N u c t e N u for i= 1,2,..., n, then, by the 
e nition of p, a x a 2 • • • a n = c t c 2 ■ ■ • c„. The uniqueness in the definition 
of internal direct product forces = c l3 « 2 = c 2 , Thus ib 

is one-to-one. 

All that remains is to show that \p is a homomorphism of T onto G. 
jlf A = (a 1; ..., a n ), Y = {b u . . ., b n ) are elements of T then 

<KX7) = ^((a l3 ...,a„)(^,...,^)) 

= 'l'( a ib 1 ,a 2 b 2 ,...,a n b n ) 

= a 1 b l a 2 b 2 • • ■ a n b n . 

However, by Lemma 2.13.1, afj = bp x if i ± j. This tells us that 
a it>ia 2 b 2 -- • a n b n = a^ 2 - • -a n b y b 2 - • b n . Thus P{XY) = a,a 2 -■ -a&by •b . 
But we can recognize a x a 2 • • • a„ as ^((a l3 a 2 , . . ., a n )) = P(X) and b,b 2 • • • b 
as P(Y). We therefore have P(XY) = ^(Z)^(7). In short, we have shown 
that p is an isomorphism of T onto G. This proves the theorem. 

. Note one Particular thing that the theorem proves. If a group G is 
isomorphic to an external direct product of certain groups G t , then G is, 
in fact, the internal direct product of groups G t isomorphic to the G t . We 
simply say that G is the direct product of the G t (or G t ). 

In the next section we shall see that every finite abelian group is a direct 
pro uct of cyclic groups. Once we have this, we have the structure of all 
Unite abelian groups pretty well under our control. 

One should point out that the analog of the direct product of groups 
exists in the study of almost all algebraic structures. We shall see this later 
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for vector-spaces, rings, and modules. Theorems that describe such an 
algebraic object in terms of direct products of more describable algebraic 
objects of the same kind (for example, the case of abelian groups above) are 
important theorems in general. Through such theorems we can reduce the 
study of a fairly complex algebraic situation to a much simpler one. 

Problems 

1. If A and B are groups, prove that i x Bis isomorphic to B x A. 

2. If G 1} G 2 , G 3 are groups, prove that (G^ x G 2 ) x G 3 is isomorphic 
to Gi x G 2 x G 3 . Care to generalize? 

3. If T = G 1 x G 2 x ' x G n prove that for each i - 1, 2,. .., n 
there is a homomorphism </> ; of T onto G ; . Find the kernel of </>;. 

4. Let G be a group and let T = G x G. 

(a) Show that D = {(g, g) e G x G | g e G} is a group isomorphic 
to G. 

(b) Prove that D is normal in T if and only if G is abelian. 

5. Let G be a finite abelian group. Prove that G is isomorphic to the 
direct product of its Sylow subgroups. 

6. Let A, B be cyclic groups of order m and n, respectively. Prove that 
A x B is cyclic if and only if m and n are relatively prime. 

7. Use the result of Problem 6 to prove the Chinese Remainder Theorem; 
namely, if m and n are relatively prime integers and u, v any two 
integers, then we can find an integer x such that x = u mod m and 
x = v mod n. 

8. Give an example of a group G and normal subgroups N l} ..., N„ 
such that G = N l N 2 • • • N n and N t n Nj = ( e) for i ^ j and yet 
G is not the internal direct product of N v ..., N n . 

9. Prove that G is the internal direct product of the normal subgroups 
N l} . . ., N n if and only if 

1 . G = N r -N n . 

2. N t n (N t N 2 • • ‘ • N n ) = (e) for i = 1,. . ., n. 

10. Let G be a group, K x ,...,K n normal subgroups of G. Suppose that 
K l n K 2 n • • • n K n = (e). Let V x = G/K.. Prove that there is an 
isomorphism of G into V l x V 2 x • • • x V n . 

*11. Let G be a finite abelian group such that it contains a subgroup 
H 0 ^ (e) which lies in every subgroup H ^ {e). Prove that G must 
be cyclic. What can you say about o(G)? 

12. Let G be a finite abelian group. Using Problem 11 show that G is 
isomorphic to a subgroup of a direct product of a finite number of 
finite cyclic groups. 
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13. Give an example of a finite non-abelian group G which contains a 
subgroup H 0 ^ (e) such that H 0 a H for all subgroups H ^ (e) of G. 

14. Show that every group of order p 2 , p a prime, is either cyclic or is 
isomorphic to the direct product of two cyclic groups each of order p. 

I *15. Let G = A x A where A is cyclic of order p, p a prime. How many 
; automorphisms does G have? 

| 16. If G = K 1 x K 2 x • • • x K n describe the center of G in terms of 

i those of the K t . 

17. If G = K l x K 2 x • • • x K n and g e G, describe 

| 

N (s) = {x e G\xg = gx). 

(, 18. If G is a finite group and N x ,..., N n are normal subgroups of G 
l such that G = N X N 2 ' • * N n and 0 (G) = o(N 1 )o(N 2 ) • ■•o(N n ), prove 
that G is the direct product of N x , N 2 ,. . . , N . 

I 2.14 Finite Abelian Groups 

$. c ^ ose this chapter with a discussion (and description) of the structure 
I of an arbitrary finite abelian group. The result which we shall obtain is a 
I famous classical theorem, often referred to as the Fundamental Theorem on 
S Fl f lte Abelian Groups. It is a highly satisfying result because of its de- 
| cisiveness. Rarely do we come out with so compact, succinct, and crisp a 
I result - In h the structure of a finite abelian group is completely revealed, 

| and b y mea ns of it we have a ready tool for attacking any structural problem 
; about finite abelian groups. It even has some arithmetic consequenees. 

, for instance, one of its by-products is a precise count of how many non- 
? isomorphic abelian groups there are of a given order. 

: . In a11 fairness one should add that this description of finite abelian groups 
j is not as general as we can go and still get so sharp a theorem. As you shall 
See in Section 4.5, we completely describe all abelian groups generated by 
j a finite set of elements—a situation which not only covers the finite abelian 
group case, but much more. 

| We now state this very fundamental result. 

(THEOREM 2.14.1 Every finite abelian group is the direct product of cyclic 
(groups. 

Vj Pr ° 0f - ° ur first ste P is to reduce the problem to a slightly easier one. 
jWe have already indicated in the preceding section (see Problem 5 there) 
mat any finite abelian group G is the direct product of its Sylow subgroups. 

L we knew that each such Sylow subgroup was a direct product of cyclic 
proups we could put the results together for these Sylow subgroups to 
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realize G as a direct product of cyclic groups. Thus it suffices to prove the 
theorem for abelian groups of order p n where p is a prime. 

So suppose that G is an abelian group of order p n . Our objective is to 
find elements a ly ...,a k mG such that every element x e G can be written 
in a unique fashion as * = a?a 2 " • • • < k - Note that if this were true and 
a. . . , a k were of order p n \ . . . , p" k , where > n 2 > • • * > n k , then the 
maximal order of any element in G would be ^ (Prove!). This gives us 
a cue of how to go about finding the elements a t , . . •, a k that we see . 

The procedure suggested by this is: let a l be an element of maximal 
order in G. How shall we pick a 2 ? Well, if A = K) the subgroup 
generated by a l3 then a 2 maps into an element of highest order m G/A 
If we can successfully exploit this to find an appropriate a 2 , and it A - 
(a 2 ), then a 3 would map into an element of maximal order in G\A X A 2 
and so on. With this as guide we can now get down to the brass tacks of 

the proof. , , . 

Let a, be an element in G of highest possible order, p , and let A - 

(a,). Pick b 2 in G such that b 2 , the image of b 2 in G = GjA v has maximal 
order p n \ Since the order of b 2 divides that of b 2 , and since the order oi 
a, is maximal, we must have that n x > n 2 . In order to get a direct product 
of A, with (b 2 ) we would need A n (b 2 ) = («); this might not be true 
for the initial choice of b 2 , so we may have to adapt the element b 2 . Suppose 
that A, n (b 2 ) * (e); then, since */" 2 e A t and is the first power of * 2 to 
fall in A (by our mechanism of choosing b 2 ) we have that b 2 2 - a i . 
Therefore KT" 1 "" 2 = (V" 2 )'" 1 '" 2 = b 2 p "' = whence A*" 1 " 2 = /• Since 
a is of order p ni we must have that p ni | ip nx " 2 , and so p" 2 \ i. Thus, re¬ 
calling what i is, we have b /" 2 = V = V P " 2 - This tells us that if « 2 - 
a 1 ~ j b 2 then a/" 2 = e. The element a 2 is indeed the element we seek. Let 
A 2 =\a 2 ). We claim that A, n A 2 = («). For, suppose that a 2 * e A 5 
since a 2 = ar j b 2 , we get {a x ~ J b 2 ) f e A, and so b 2 < e A- By choice of ' b 2 , 
this last relation forces p" 2 | t, and since af 2 = « we must have that * 2 - 
In short A n A 2 = («). 

We continue one more step in the program we have outlined. Le 
b e G map into an element of maximal order in Gj(A t A 2 ). If the order 
of the image of b 3 in G/(AA) is p n3 , we claim that n 3 < n 2 < n^. Y 
By the choice of n 2 , b 3 v " 2 e A so is certainly in AA- Thus n 3 ^ n 2 • 
b p " 3 e A,A , b 3 p " 3 = We claim that p" 3 \ i t and p" 3 \i 2 - 1<or ’ 

b- 2 e A hence KaV"-" 3 - ^ ^ ^is tells us 

that A 2P " 2 '" 3 e A and so p " 2 1 i 2 p n2 ~ n \ which is to say,^ 3 1 1 2 . Also b 3 ' 
e, hence «'VT" 1 "" 3 = b 3 pn ' = e; this says that a^ " 3 e A n A - (« > 
that is, <2 1 ilP " 1 ”" 3 = e. This yields that p n3 \ i v Let q = jip n3 , h — J 2 P J ^ 
b p " 3 = a l hpn3 a 2 i2pr ' 3 . Let a 3 — a l ~ h a 2 ~ j2 b 3 , A 3 = (a 3 ); note that a 3 p -- e ' 
We claim that A 3 n (A t A 2 ) = (e). For if a 3 ‘ e A,A 2 then (a^a 2 - J 

A A, g ivin g us b 3 e A i A z- But then P " 3 1 *’ whence ’ smCe _ e > WC ha 

a J = e. In other words, A 3 n (T 1 T 2 ) = (e). 
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Continuing this way we get cyclic subgroups A x = (a 1 ) } A 2 = 
(a 2 ),---,A k = (a k ) of order p n \ p" 2 ,..., p nk , respectively, with n 1 > 

: n 2 — ' ‘ * ^ n k such that G = A^A 2 • • • A k and such that, for each i, 

Ai n (A l A 2 • • = (e). This tells us that every x e G has a unique 

representation as x = a x a 2 • • • a k where a\ e A lf ..., a' k e A k . In other 
^words, G is the direct product of the cyclic subgroups A u A 2 ,.. ., A k . 
The theorem is now proved. 

| DEFINITION If G is an abelian group of order p n , p a prime, and G = 
i Ay X A 2 x • • • x A k where each A t is cyclic of order p n ‘ with n x > n 2 > 

i > n k > 0, then the integers n x , n 2 , . . ., n k are called the invariants 

I of G. 


I J ust because we called the integers above the invariants of G does not 
; mean that they are really the invariants of G. That is, it is possible that we 
| Can assign different sets of invariants to G. We shall soon show that the 
invariants of G are indeed unique and completely describe G. 

Note one other thing about the invariants of G. If G = A t x • • • x A k , 
where A t is cyclic of order p n \ n x > n 2 > • • • > n k > 0, then o(G) = 

o(A 1 )o(A 2 ) ■ ■ ■ o(A k ), hence p n = p n 'p n > • • -p" k = p^+n 2+ - • • +n k , whence n = 

Pi + n 2 + • • • + n k . In other words, n x , n 2 ,..., n k give us a partition of n. 
|We have already run into this concept earlier in studying the conjugate 
classes in the symmetric group. 

Before discussing the uniqueness of the invariants of G, one thing should 
be made absolutely clear: the elements a x ,..., a k and the subgroups 
Ai> • • • , A k which they generate, which arose above to give the decom¬ 
position of G into a direct product of cyclic groups, are not unique. Let*S 
see this in a very simple example. Let G = {e, a, b, ab } be an abelian 
group of order 4 where a 2 = b 2 = e, ab = ba. Then G = A x B where 

A = (a), B = (, b) are cyclic groups of order 2. But we have another 

decomposition of G as a direct product, namely, G = C x B where 
^ ~ ( a b) and B = ( b ). So, even in this group of very small order, we can 
jget distinct decompositions of the group as the direct product of cyclic 
groups. Our claim—which we now want to substantiate—is that while 
lese cyclic subgroups are not unique, their orders are 

EFINITION If G is an abelian group and j is any integer, then G(s) = 
* e G \ X s = e}. 

Because G is abelian it is evident that G(s) is a subgroup of G. We now 
rove 

EMMA 2.14.1 If G and G' are isomorphic abelian groups , then for every 
'■teger s, G (s ), and G'(s ) are isomorphic. 
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Proof. Let (j) be an isomorphism of G onto G' . We claim that cj) maps 
G(s) isomorphically onto G'(s). First we show that <j>(G(s)) c= G'(s). 
For, if x e G(s) then x s = e, hence </>(**) = (j)(e) = e’. But cj){x s ) = <K*) S ; 
hence (j){x) s = e' and so (j)(x) is in G'{s). Thus ( p{G(s)) <= G (j). 

On the other hand, if u e G’{s) then (i/') s = «'• But > since (j) is onto, 
u' = <p{y) for some y e G. Therefore e' = (u') s = <j>(y) s = <K/)- be¬ 
cause (j) is one-to-one, we have y s - e and so y e G(s). Thus $ maps G(s) 
onto G'(s). 

Therefore since (j) is one-to-one, onto, and a homomorphism from G(s) 
to G'(s), we have that G(s) and G'(s) are isomorphic. 

We continue with 


LEMMA 2.14.2 Let G be an abelian group of order p n , p a prime. Suppose 

that G = A x x A 2 x • • • x A k , where each A t = (a { ) is cyclic of order p m , 

and n^ > n 2 > • ■ • > n k > 0. If mis an integer such that n t > m > n t + 1 then 

G{p m ) = B x x ■ • • X B t X A t + 1 X • • • X A k where B t is cyclic of order 

p m , generated by a ; p "‘ ,for i < t. The order of G{f ) is p , where 

k 

u = mt + ni. 

i = t + 1 


Proof. First of all, we claim that A t + 1 ,. .., A k are all in G(p )■ For, 
since m > n t + 1 > ■ ■ ■ > n k > 0, if j > t + 1, af = (, af*)*”' 

Hence A j} forj>t+l lies in G (_ p m ). 

Secondly, if i < t then n t > m and {af 1 m ) pm = af 1 = e, whence 

each such af rm is in G{p m ) and so the subgroup it generates, B u is also 

in G{p m ). 

Since B v A t + 1 , . .., A k are all in G{p m ), their product (which 

is direct, since the product A t A 2 ■ ■ ■ A k is direct) is in G(p m ). Hence 

G(p m ) B x x • • • x B t x A t+1 x • • • x A k . 

On the other hand, if # = afaf 2 • • • af is in G(p m ), since it then satisfies 
x P m = e , we set e = x pm = af pm • ■ • af pm . However, the product of the 
subgroups A 1} . . . , A k is direct, so we get 


-tip" 


AkP m _ 


Thus the order of a h that is, p ni must divide X t p m for i = 1, 2, . . ., k. If 
i > t + 1 this is automatically true whatever be the choice of A ( + 1 , . • • > h 
since m > n t + 1 > • • ■ > n k , hence p n > \ p m , i>t+ 1. However, for 
i < t, we get from p ni | XrfT that p n ‘ m \ X t . Therefore X t = vf n ‘ ” for 
some integer v t . Putting all this information into the values of the X/s m 
the expression for x as x = af 1 • • ■ af we see that 


*t +1 
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This says that x e B l x • • • x B t x A t + 1 x • • • x A k . 

Now since each B^ is of order p m and since o(A^ = p n 1 and since 
G = B x x • • • x B t x A t + 1 x • • • x A k , 

°{G) = o(B 1 )o(B 2 ) • • • o(B t )o(A t + 1 ) • • • 0 (A k ) = p m p m • • • p m p n t+i .. .^ 
Thus, if we write o(G) = p u , then t-times 

k 

u = mt + ^2 n t . 

i = t+l 

The lemma is proved. 

COROLLARY If G is as in Lemma 2.14.2, then o{G{p)) = p k . 

Proof. Apply the lemma to the case m = 1. Then t = k, hence 
u = Ik = k and so o{G) = p k . 

We now have all the pieces required to prove the uniqueness of the 
invariants of an abelian group of order p n . 

I: 

THEOREM 2.14.2 Two abelian groups of order p n are isomorphic if and only 
if they have the same invariants. 

; In other words, if G and G' are abelian groups of order p n and G = A 1 x • • • x A k , 
where each A t is a cyclic group of order p Hi , n t > • • • > n k > 0, and G' = 
B[ x • • • x B' s , where each B\ is a cyclic group of order p hi , h x > • • • > h s > 0, 
then G and G' are isomorphic if and only if k = s and for each i, n t ~ h t . 

Proof. One way is very easy, namely, if G and G' have the same in¬ 
variants then they are isomorphic. For then G = A 1 x • • • x A k where 
A i = ( a i) is cyclic of order and G' = B\ x • • • x B' k where B[ = {b[) 
is cyclic of order p n \ Map G onto G' by the map p^af 1 • • • af k ) = 

(.b'lY 1 ' • ‘ (KY k - We leave it to the reader to verify that this defines an 
isomorphism of G onto G'. 

Now for the other direction. Suppose that G = A x x ■ • • x A k , 
G' = B[ x • • • x B' s , A u B[ as described above, cyclic of orders p ni , p hi , 
respectively, where n x > • • • > n k > 0 and h x > ■ • • > h s > 0. We 
Want to show that if G and G' are isomorphic then k = s and each n { = h -. 

If G and G' are isomorphic then, by Lemma 2.14.1, G{p m ) and G' (p m ) 
must be isomorphic for any integer m > 0, hence must have the same order. 
Let’s see what this gives us in the special case m = 1 ; that is, what in¬ 
formation can we garner from o(G(p)) = o(G'(p)). According to the 
corollary to Lemma 2.14.2, o{G{p)) = p k and o{G'(p)) = p s . Hence 

^ P ^.rid so k — s. At least we now know that the number of invariants 
for G and G' is the same. 
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If Wj # h t for some i, let t be the first rsuch that n t # h t ; we may sup¬ 
pose that n t > h r Let m = h v Consider the subgroups, H = {x pm \x e G } 
and H' = {( x') pm | x' e G}, of G and G', respectively. Since G and G' are 
isomorphic, it follows easily that H and H' are isomorphic. We now ex¬ 
amine the invariants of H and H'. 

Because G = A x x • • • x A k , where A, = (a,) is of order p n ‘, we get that 
H = Q x • • • x C t x • • • x C r , 

where C t = {a i pm ) is of order p ni ~ m , and where r is such that n r > m = 
h t > n r _ x . Thus the invariants of H are n x — m, n 2 — m ,. .., n r — m 
and the number of invariants of H is r > t. 

Because G' = B[ x • • • x B ' k , where B t = (b[) is cyclic of order p h \ 
we get that H' = D\ X • • • X D' t _ x , where D\ = ((6;) pm ) is cyclic of order 
phi-m the invariants of H' are h x — m ,..., h t _ x — m and so the 

number of invariants of H' is t — 1. 

But H and H' are isomorphic; as we saw above this forces them to have 
the same number of invariants. But we saw that assuming that n t ^ hi 
for some i led to a discrepancy in the number of their invariants. In con¬ 
sequence each Hi = hi, , and the theorem is proved. 

An immediate consequence of this last theorem is that an abelian group 
of order p n can be decomposed in only one way— as far as the orders of the 
cyclic subgroups is concerned— as a direct product of cyclic subgroups. Hence 
the invariants are indeed the invariants of G and completely determine G. 

If n t > * • • > n k > 0, n = n x + • • ■ + n k , is any partition of n, then 
we can easily construct an abelian group of order p n whose invariants are 
«!>•••> n k > 0. To do this, let A v be a cyclic group of order p ni and 
let G = A x x • • • x A k be the external direct product of A u . .., A k . 
Then, by the very definition, the invariants of G are n t > • • * > n k > 0. 
Finally, two different partitions of n give rise to nonisomorphic abelian 
groups of order p n . This, too, comes from Theorem 2.14.2. Hence we have 

THEOREM 2.14.3 The number of nonisomorphic abelian groups of order p n , 
p a prime, equals the number of partitions of n. 

Note that the answer given in Theorem 2.14.3 does not depend on the 
prime p ; it only depends on the exponent n. Hence, for instance, the number 
of nonisomorphic abelian groups of order 2^ equals that of orders 3 , or 
5 4 , etc. Since there are five partitions of 4, namely: 4 = 4, 3 + 1, 2 + 2, 
2+1 + ljl + l + l + l, then there are five nonisomorphic abelian 
groups of order p A for any prime p. 

Since any finite abelian group is a direct product of its Sylow subgroups, 
and two abelian groups are isomorphic if and only if their corresponding 
Sylow subgroups are isomorphic, we have the 
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COROLLARY The number of nonisomorphic abelian groups of order pf 1 - • • pf, 
where the p t are distinct primes and where each a ; > 0, is />(a 1 )/»(a 2 ) • • • p( a r ), 
where p{u) denotes the number of partitions of u. 

Problems 

1. If G is an abelian group of order p n , p a prime and n 1 > n 2 > • • • > 
n k > 0, are the invariants of G, show that the maximal order of any 
element in G is p" 1 . 

2. If G is a group, A 1} . . ., A k normal subgroups of G such that A t n 
(AiA 2 ' •' Ai_i) = (e) for all i, show that G is the direct product of 
A u •.., A k if G = A 1 A 1 • • • A k . 

3. Using Theorem 2.14.1, prove that if a finite abelian group has sub¬ 
groups of orders m and n, then it has a subgroup whose order is the least 
common multiple of m and n. 

4. Describe all finite abelian groups of order 

(a) 2 6 . (b) ll 6 . (c) 7 5 . (d) 2 4 • 3 4 . 

5. Show how to get all abelian groups of order 2 3 • 3 4 • 5. 

6. If G is an abelian group of order p n with invariants > • ■ • > n k > 0 
and H # (e) is a subgroup of G, show that if h t > • • • > h s > 0 are 
the invariants of H, then k > s and for each i, h t < n t for i = 1,2,..., s. 

If G is an abelian group, let G be the set of all homomorphisms of G 
into the group of nonzero complex numbers under multiplication. 
If 0i, 02 e G, define 0! • 0 2 by ((j) 1 • (j> 2 )(g) = 0i(^)0 2 (^) for all g g G. 

7. Show that G is an abelian group under the operation defined. 

8. If (/) e G and G is finite, show that c/)(g) is a root of unity for eveYy 
geG. 

9. If G is a finite cyclic group, show that G is cyclic and o{G) = o{G), 
hence G and G are isomorphic. 

10. If g 1 zfc g 2 are in G, G a finite abelian group, prove that there is a 
0 e G with (j>( gl ) # 0(£ 2 ). 

11. If G is a finite abelian group prove that o{G) = o{G) and G is iso¬ 
morphic to G. 

12. If (j) # 1 e G where G is an abelian group, show that ^ <}>(g) = 0. 

geG 

Supplementary Problems 

There is no relation between the order in which the problems appear and 
the order of appearance of the sections, in this chapter, which might be 
relevant to their solutions. No hint is given regarding the difficulty of any 
problem. 
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1. (a) If G is a finite abelian group with elements a u a 2 , . . ., a n , prove 

that a x a 2 • • • a„ is an element whose square is the identity. 

(b) If the G in part (a) has no element of order 2 or more than one 
element of order 2, prove that a t a 2 • • • a n = e. 

(c) If G has one element, y, of order 2, prove that a x a 2 " ' a n = y_ 

(d) ( Wilsons theorem) If p is a prime number show that 0 - 1 ) '• = 

-!(/>)• 

2. If/) is an odd prime and if 


where a and b are integers, prove that p\a. If /) > 3, prove that 

p 2 | a- 

3. If/) is an odd prime, a # 0 (p) is said to be a quadratic residue of p if 
there exists an integer x such that x 2 = a(p). Prove 

(a) The quadratic residues of p form a subgroup Q of the group of 
nonzero integers mod p under multiplication. 

(b) o(Q) = (/) - l)/2. . . . , 

(c) If q e Q, n £ Q in is called a nonresidue ), then nq is a nonresidue. 

(d) If n l5 n 2 are nonresidues, then n i n 2 is a residue. 

(e) If a is a quadratic residue of p, then a {p ~ 1)/2 = +!(/»)■ 

4. Prove that in the integers mod p, p a prime, there are at most n 
solutions of x" = 1 (/>) for every integer n. 

5. Prove that the nonzero integers mod p under multiplication form a 
cyclic group if p is a prime. 

6. Give an example of a non-abelian group in which {xy) 3 = x 3 / for 
all x and y. 

7. If G is a finite abelian group, prove that the number of solutions o 
x n = e in G, where n \ o{G) is a multiple of n. 

8. Same as Problem 7, but do not assume the group to be abelian. 

9. Find all automorphisms of S 3 and S A , the symmetric groups of degree 
3 and 4. 

DEFINITION A group G is said to be solvable if there exist subgroups G - 
N 0 ZD N x ZD N 2 3 • • • 13 N r = (e) such that each N t is normal in N t -1 and 

i\f._ 1 /iV i is abelian. 

10. Prove that a subgroup of a solvable group and the homomorphic 
image of a solvable group must be solvable. 

11. If G is a group and N is a normal subgroup of G such that both 
and GjN are solvable, prove that G is solvable. 

12. If G is a group. A a subgroup of G and N a normal subgroup of G, 
prove that if both A and N are solvable then so is AN. 
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13. If G is a group, define the sequence of subgroups of G by 

( 1 ) G (1) = commutator subgroup of G — subgroup of G generated 
by all aba~ x b~ x where a, b e G. 

(2) = commutator subgroup of G ( ‘~if i > 1 . 

Prove 

(a) Each G (,) is a normal subgroup of G. 

(b) G is solvable if and only if = ( e) for some k > 1. 

14. Prove that a solvable group always has an abelian normal subgroup 
M ^ (e). 

If G is a group, define the sequence of subgroups G (l) by 

(a) G (1) = commutator subgroup of G. 

(b) G (i) = subgroup of G generated by all aba~ x b~ x where a e G, 
b e G (i _ iy 

G is said to be nilpotent if = ( e) for some k > 1 . 

15. (a) Show that each G (t) is a normal subgroup of G and G (i) z> G (,) . 

(b) If G is nilpotent, prove it must be solvable. 

(c) Give an example of a group which is solvable but not nilpotent. 

16. Show that any subgroup and homomorphic image of a nilpotent group 
must be nilpotent. 

17. Show that every homomorphic image, different from [e), of a nil- 
potent group has a nontrivial center. 

18. (a) Show that any group of order p", p a prime, must be nilpotent. 
(b) If G is nilpotent, and H 7 ^ G is a subgroup of G, prove that 

N(H) 7 ^ H where N(H ) = {x e G | xHx~ x = H }. r 

19. If G is a finite group, prove that G is nilpotent if and only if G is the 
direct product of its Sylow subgroups. 

20. Let G be a finite group and H a subgroup of G. For A, B subgroups 
of G, define A to be conjugate to B relative to H if B = x~ x Ax for 
some x e H. Prove 

(a) This defines an equivalence relation on the set of subgroups of G. 

(b) The number of subgroups of G conjugate to A relative to H 
equals the index of N(A) n H in H. 

21. (a) If G is a finite group and if P is a j&-Sylow subgroup of G, prove 

that P is the only j&-Sylow subgroup in N{P). 

(b) If P is a j&-Sylow subgroup of G and if a pk = e then, if a E N(P), 
a must be in P. 

(c) Prove that N(N(P)) = N(P). 

22. (a) If G is a finite group and P is a j&-Sylow subgroup of G, prove 

that the number of conjugates of P in G is not a multiple of p. 
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23. 


24. 

25. 

26. 

27. 

28. 

29. 

30. 

31. 

32. 

#33. 


(b) Breaking up the conjugate class of P further by using conjugacy 
relative to P, prove that the conjugate class of P has 1 + kp 
distinct subgroups. {Hint: Use part (b) of Problem 20 and 
Problem 21. Note that together with Problem 23 this gives an 
alternative proof of Theorem 2.12.3, the third part of Sylow’s 
theorem.) 

(a) If P is a />-Sylow subgroup of G and B is a subgroup of G of order 
p k , prove that if B is not contained in some conjugate of P, then 
the number of conjugates of P in G is a multiple of p. 

(b) Using part (a) and Problem 22, prove that B must be contained 
in some conjugate of P. 

(c) Prove that any two />-Sylow subgroups of G are conjugate in G. 
(This gives another proof of Theorem 2.12.2, the second part of 
Sylow’s theorem.) 

Combine Problems 22 and 23 to give another proof of all parts of 
Sylow’s theorem. 

Making a case-by-case discussion using the results developed in this 
chapter, prove that any group of order less than 60 either is of prime 
order or has a nontrivial normal subgroup. 

Using the result of Problem 25, prove that any group of order less 
than 60 is solvable. 

Show that the equation x 2 ax = a~ x is solvable for x in the group 
G if and only if a is the cube of some element in G. 

Prove that (1 2 3) is not a cube of any element in S n . 

Prove that xax = b is solvable for x in G if and only if ab is the square 
of some element in G. 


If G is a group and a e G is of finite order and has only a finite number 
of conjugates in G, prove that these conjugates of a generate a finite 
normal subgroup of G. 

Show that a group cannot be written as the set-theoretic union of 
two proper subgroups. 

Show that a group G is the set-theoretic union of three proper sub¬ 
groups if and only if G has, as a homomorphic image, a noncyclic 
group of order 4. 


Let p be a prime and let Z be the integers mod p under addition and 


multiplication. Let G be the group 
are such that ad — be = 1. Let 


(:i) 


where a, b, c, d e Z p 


C = 




and let LF{2, p) = G/C. 
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(a) Find the order of LF{2,p). 

(b) Prove that LF(2,p ) is simple if p > 5. 

#34. Prove that LF(2, 5) is isomorphic to A 5 , the alternating group of 
degree 5. 

#35. Let G = LF(2, 7); according to Problem 33, G is a simple group of 
order 168. Determine exactly how many 2-Sylow, 3-Sylow, and 
7-Sylow subgroups there are in G. 


Supplementary Reading 

Burnside, W., Theory of Groups of Finite Order, 2nd ed. Cambridge, England: 

Cambridge University Press, 1911; New York: Dover Publications, 1955. 

Hall, Marshall, Theory of Groups. New York: The Macmillan Company, 1961. 

Topics for Class Discussion 

Alperin, J. L., “A classification of w-abelian groups,” Canadian Journal of Math¬ 
ematics, Vol. XXI (1969), pages 1238-1244. 

McKay, James, H., “Another proof of Cauchy’s group theorem,” American Math¬ 
ematical Monthly, Vol. 66 (1959), page 119. 

Segal, I. E., “The automorphisms of the symmetric group,” Bulletin of the American 
Mathematical Society, Vol. 46 (1940), page 565. 



3 

Ring Theory 


3.1 Definition and Examples of Rings 

As we indicated in Chapter 2, there are certain algebraic systems 
which serve as the building blocks for the structures comprising the 
subject which is today called modern algebra. At this stage of the 
development we have learned something about one of these, namely 
groups. It is our purpose now to introduce and to study a second 
such, namely rings. The abstract concept of a group has its origins 
in the set of mappings, or permutations, of a set onto itself. In con¬ 
trast, rings stem from another and more familiar source, the set of 
integers. We shall see that they are patterned after, and are gen¬ 
eralizations of, the algebraic aspects of the ordinary integers. 

In the next paragraph it will become clear that a ring is quite 
different from a group in that it is a two-operational system; these 
operations are usually called addition and multiplication. Yet, 
despite the differences, the analysis of rings will follow the pattern 
already laid out for groups. We shall require the appropriate analogs 
of homomorphism, normal subgroups, factor groups, etc. With the 
experience gained in our study of groups we shall be able to make the 
requisite definitions, intertwine them with meaningful theorems, and 
end up proving results which are both interesting and important 
about mathematical objects with which we have had long acquaintance. 
To cite merely one instance, later on in the book, using the tools 
developed here, we shall prove that it is impossible to trisect an angle 
of 60° using only a straight-edge and compass. 
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DEFINITION A nonempty set R is said to be an associative ring if in R 
there are defined two operations, denoted by + and • respectively, such 
that for all a, b, c in R: 

1. a + b is in R. 

2. a + b = b + a. 

3. (a + b) + c = a + (b + c). 

4. There is an element 0 in R such that a + 0 = a (for every a in R). 

5. There exists an element — a in R such that a + ( — a ) =0. 

6. a • b is in R. 

7. a • (b ' c) = (a • b) • c. 

8. a • (b + c) = a • b + a • c and {b + c)-a=-b-a + c- a (the two distrib¬ 
utive laws). 

Axioms 1 through 5 merely state that R is an abelian group under the 
operation +, which we call addition. Axioms 6 and 7 insist that R be closed 
under an associative operation •, which we call multiplication. Axiom 8 
serves to interrelate the two operations of R. 

Whenever we speak of ring it will be understood we mean associative 
ring. Nonassociative rings, that is, those in which axiom 7 may fail to hold, 
do occur in mathematics and are studied, but we shall have no occasion to 
consider them. 

It may very well happen, or not happen, that there is an element 1 in 
R such that a • 1 = 1 • a = a for every a in R; if there is such we shall 
describe R as a ring with unit element. 

If the multiplication of R is such that a • b = b - a for every a, b in R, then 
we call R a commutative ring. 

Before going on to work out some properties of rings, we pause to examine 
some examples. Motivated by these examples we shall define various 
special types of rings which are of importance. 

Example 3.1.1 R is the set of integers, positive, negative, and 0; + is 
the usual addition and • the usual multiplication of integers. R is a com¬ 
mutative ring with unit element. 

Example 3.1.2 R is the set of even integers under the usual operations 
of addition and multiplication. R is a commutative ring but has no unit 
element. 

Example 3.1.3 R is the set of rational numbers under the usual addition 
and multiplication of rational numbers. R is a commutative ring with unit 
element. But even more than that, note that the elements of R different 
from 0 form an abelian group under multiplication. A ring with this latter 
property is called a field. 
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Example 3.1.4 R is the set of integers mod 7 under the addition and 
multiplication mod 7. That is, the elements of R are the seven symbols 
0, I, 2, 3, 4, 5, 6 , where 

1 . % + j = Jc where k is the remainder of i + j on division by 7 (thus, for 

instance, 4 + 5 = 2 since 4 + 5=9, which, when divided by 7, 
leaves a remainder of 2 ). _ _ 

2. I • J = 7ft where m is the remainder of ij on division by 7 (thus, 5-3 = 1 
since 5-3 = 15 has 1 as a remainder on division by 7). 

The student should verify that R is a commutative ring with unit element. 
However, much more can be shown; namely, since 

I • T = T = 6 - 6 , 

2- 4 = 1 = 4-2, 

3- 5 = 1 =5-3, 

the nonzero elements of R form an abelian group under multiplication. 
R is thus a field. Since it only has a finite number of elements it is called a 
finite field. 

Example 3.1.5 R is the set of integers mod 6 under addition and 
multiplication mod 6 . If we denote the elements in R by 0, 1, 2,. . ., 5, 
one sees that 2-3 = D, yet 2 ^ 0 and 3^0. Thus it is possible in a ring R 
that a • b = 0 with neither a = 0 nor b = 0. This cannot happen in a field 
(see Problem 10, end of Section 3.2), thus the ring R in this example is 
certainly not a field. 

Every example given so far has been a commutative ring. We now 
present a noncommutative ring. 

Example 3.1 .6 R will be the set of all symbols 

2 

a ll e ll T a 12 e 12 T a 21 e 21 T a 22 e 22 = a ij e ij> 

i,j= 1 

where all the a,-.- are rational numbers and where we decree 
J 2 2 

Y. a iJ e ij = ^2 Pij e ij 

i,j=l i,j=l 

if and only if for all i,j = 1 , 2 , a,-y = fiij, 

2 2 2 
y1 a ij e ij + y Pij e ij = z2 Pi/) e ij‘ 

tj=l i,j= 1 '0 = 1 

^ y a ij e ij ^ ^ y x pi/ij'j = 


( 1 ) 

( 2 ) 


( 3 ) 
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where 

2 

Vij = Yj *ivPvj = *tlPlj + *i 2 p2j- 

v=l 

This multiplication, when first seen, looks rather complicated. However, 
it is founded on relatively simple rules, namely, multiply by Y*Pij e ij 

formally, multiplying out term by term, and collecting terms, and using the 
relations e tj • e kl = 0 for j ^ fc, • e jl = e u in this term-by-term collecting. 
(Of course those of the readers who have already encountered some linear 
algebra will recognize this example as the ring of all 2 x 2 matrices over 
the field of rational numbers.) 

To illustrate the multiplication, if a = e n — e 21 + e 22 and b = 
e 22 + 3tf 12 , then 

d ' b = fen 6 2 i "b ^22) ' (^22 "b 3# i 2 ) 

== ^11*^22 "b *^12 ^21"^22 3^21*^12 "b ^22'^22 "b 3^ 22 ‘^j 2 

= 0 -f- 3^^2 — 0 — 3^22 "b ^22 ”b 0 
= 3^ 12 3^22 "b e 22 = 3^12 2^22’ 

Note that e n • e i2 = e i2 whereas e i2 • e n = 0. Thus the multiplication 
in R is not commutative. Also it is possible for u • v = 0 with u ^ 0 and 
v 0. 

The student should verify that R is indeed a ring. It is called the ring of 
2x2 rational matrices. It, and its relative, will occupy a good deal of 
our time later on in the book. 

Example 3.1.7 Let C be the set of all symbols (a, P) where a, P are 
real numbers. We define 

(a, ft) = (y, S) if and only if a = y and P = 6. ( 1 ) 

In C we introduce an addition by defining for x = (a, P),y = (y, S) 

x + y = (a, 0) + (y, S) = (a + y, p + S). ( 2 ) 

Note that x + y is again in C. We assert that C is an abelian group under 
this operation with (0, 0) serving as the identity element for addition, and 
( — a, — P) as the inverse, under addition, of (a, P). 

Now that C is endowed with an addition, in order to make of C a ring 
we still need a multiplication. We achieve this by defining 

for X = (a, P), Y = (y, S) in C, 

X'Y= (a, p) • (y, 5) = (ay - pd, ad + py). 


(3) 
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Note that X-Y = Y-X. Also X-(1,0) = (1,0)-X=X so that (1,0) 
is a unit element for C. 

Again we notice that X-Y e C. Also, if X = (a, /?) ^ (0, 0) then, 
since a, are real and not both 0, a 2 + ft 2 ^ 0; thus 

y = (_£_Z LX 

\<X 2 + |5 2 V + ft 2 ) 


is in C. Finally we see that 


(a, P) 


_ zl\ 

\a 2 + j3 2 ’ a 2 + P 2 J 


( 1 , 0 ). 


All in all we have shown that C is a field. If we write (a, /?) as a + fti, 
the reader may verify that C is merely a disguised form of the familiar 
complex numbers. 


Example 3.1 .8 This last example is often called the ring of real quaternions. 
This ring was first described by the Irish mathematician Hamilton. Initially 
it was extensively used in the study of mechanics; today its primary interest 
is that of an important example, although it still plays key roles in geometry 
and number theory. 

Let Q be the set of all symbols a 0 H- a x i + a 2 j T a 3 ^> where all the 
numbers a 0 , a l5 a 2 , and a 3 are real numbers. We declare two such symbols, 

a 0 + a x i + a 2 j + oc 3 k and jg 0 + P^ + PzJ + Pi k > to be ec l ual if and onl y 
if tx t = P t for t = 0, 1, 2, 3. In order to make Q into a ring we must de¬ 
fine a + and a • for its elements. To this end we define 

1 . For any X = a 0 + a x i + a 2 j + a Y = /? 0 + Pii + P 2 j + Ps k 

Q, X + Y = (a 0 + ctii + a 2 j + a 3 k) + (p 0 + Pi* + PzJ + Pi k ) = 

(a o + P 0 ) + (oq + Pi)i + (a 2 + p 2 )j + (a 3 + P 3 ) k 

and 

2. X - Y = (a 0 + a x i + a 2 j + a 3 k) • (/? 0 + fixi + P 2 j + P 3 k) = 

{^oPo ~ a lPl ~ a 2^2 “ a 3^3) + ( a oPl + UlPo + a 2^3 _ ^Pz)^ + 

(cC 0 Pz + a zPo + a 3^1 ~ a l/?3)J + ( a o/?3 + a 3Po + a lPz ~ ^zPl)^ 

Admittedly this formula for the product seems rather formidable; however, 
it looks much more complicated than it actually is. It comes from multi¬ 
plying out two such symbols formally and collecting terms using the relations 
i 2 = j 2 = k 2 = ijk = -1, ij = -ji = k, jk = -kj = i, ki = — ik = J- 
The latter part of these relations, called the multiplication table of the 
quaternion units, can be remembered by the little diagram on page 12o. As 
you go around clockwise you read off the product, e.g., ij = k, jk = h 
ki = j\ while going around counterclockwise you read off the negatives. 
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Notice that the elements +1, + z, +j, + & form a non-abelian group of 
order 8 under this product. In fact, this is the group we called the group 
of quaternion units in Chapter 2. 


The reader may prove that Q is a noncommutative ring in which 0 = 
0 + Oz + Oj + Ok and 1 = 1 + Oz + Qj + Ok serve as the zero and 
unit elements respectively. Now if X = a 0 + oqz + oc 2 j + oc 3 k is not 0, 
then not all of a 0 , a l3 a 2 , a 3 are 0 ; since they are real, /? = a 0 2 + otj 2 + 
a 2 2 + a 3 2 7 ^ 0 follows. Thus 



P 



— k e Q. 

P 


A simple computation now shows that X - Y — 1. Thus the nonzero 
elements of Q form a non-abelian group under multiplication. A ring in 
which the nonzero elements form a group is called a division ring or skew- 
field. Of course, a commutative division ring is a field. Q affords us a 
division ring which is not a field. Many other examples of noncommutative 
division rings exist, but we would be going too far afield to present one here. 
The investigation of the nature of division rings and the attempts to classify 
them form an important part of algebra. 


3.2 Some Special Classes of Rings 

The examples just discussed in Section 3.1 point out clearly that although 
rings are a direct generalization of the integers, certain arithmetic facts to 
which we have become accustomed in the ring of integers need not hold in 
general rings. For instance, we have seen the possibility of a • b = 0 with 
neither a nor b being zero. Natural examples exist where a • b ^ b • a. 
All these run counter to our experience heretofore. 

For simplicity of notation we shall henceforth drop the dot in a • b and 
merely write this product as ab. 

DEFINITION If R is a commutative ring, then a ^ 0 6 R is said to be a 
zero-divisor if there exists a b e R, b ^ 0, such that ab = 0. 
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DEFINITION A commutative ring is an integral domain if it has no zero- 
divisors. 

The ring of integers, naturally enough, is an example of an integral 
domain. 

DEFINITION A ring is said to be a division ring if its nonzero elements 
form a group under multiplication. 

The unit element under multiplication will be written as 1, and the 
inverse of an element a under multiplication will be denoted by a~ 1 . 

Finally we make the definition of the ultra-important object known as a 
field. 

DEFINITION A field is a commutative division ring. 

In our examples in Section 3.1, we exhibited the noncommutative 
division ring of real quaternions and the following fields: the rational 
numbers, complex numbers, and the integers mod 7. Chapter 5 will con¬ 
cern itself with fields and their properties. 

We wish to be able to compute in rings in much the same manner in 
which we compute with real numbers, keeping in mind always that there 
are differences—it may happen that ab ^ ba, or that one cannot divide. 
To this end we prove the next lemma, which asserts that certain things we 
should like to be true in rings are indeed true. 

LEMMA 3.2.1 If R is a ring, then for all a, b e R 

1. aO = Oa = 0. 

2. a( — b ) = ( — a)b — —(ab). 

3. ( — a)( — b) — ab. 

If, in addition, R has a unit element 1, then 

4. (— l)a = —a. 

5. (-1)(-1) = 1. 

Proof. 

1. If a e R, then aO = a( 0 -I- 0) = aO + aO (using the right distributive 
law), and since R is a group under addition, this equation implies that 
aO = 0. 

Similarly, 0 a — (0 - 1 - 0)a = 0a + 0 a, using the left distributive law, 
and so here too, 0 a = 0 follows. 

2. In order to show that a( — b) = — (a£) we must demonstrate that 
ab + a( — b) - 0. But ab + a( — b) = a(b + ( — b)) = aO = 0 by use of 
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the distributive law and the result of part 1 of this lemma. Similarly 
( — a)b = — ( ab). 

3. That ( — a)( — b) = ab is really a special case of part 2; we single it 
out since its analog in the case of real numbers has been so stressed in our 
early education. So on with it: 

(- a)( — b) = — («( — £)) (by part 2 ) 

= -(-( ab )) (by part 2 ) 

= ab 

since — ( —*) = * is a consequence of the fact that in any group 
( m ~ x ) “ 1 = u. 

4. Suppose that R has a unit element 1 ; then a + (- l)a = la + (- l) fl — 
(1 + (-1))* = Oa = 0, whence (-l)a = -a. In particular, if a = 
— C — 1)C — 1) = —(—1) = 1, which establishes part 5. 

With this lemma out of the way we shall, from now on, feel free to compute 
with negatives and 0 as we always have in the past. The result of Lemma 
3.2.1 is our permit to do so. For convenience, a + (-b) will be written 
a — b. 

The lemma just proved, while it is very useful and important, is not very 
exciting. So let us proceed to results of greater interest. Before we do so, 
we enunciate a principle which, though completely trivial, provides a 
mighty weapon when wielded properly. This principle says no more or less 
than the following: if a postman distributes 101 letters to 100 mailboxes 
then some mailbox must receive at least two letters. It does not sound very 
promising as a tool, does it? Yet it will surprise us! Mathematical ideas 
can often be very difficult and obscure, but no such argument can be made 
against this very simple-minded principle given above. We formalize itand 
even give it a name. 

THE PIGEONHOLE PRINCIPLE If n objects are distributed over m places, 
and if n > m, then some place receives at least two objects. 

An equivalent formulation, and one which we shall often use is: If n 
objects are distributed over n places in such a way that no place receives 
more than one object, then each place receives exactly one object. 

We immediately make use of this idea in proving 

LEMMA 3.2.2 A finite integral domain is a field. 

Proof. As we may recall, an integral domain is a commutative ring such 
that ab = 0 if and only if at least one of a or b is itself 0. A field, on the 
other hand, is a commutative ring with unit element in which every non¬ 
zero element has a multiplicative inverse in the ring. 
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Let D be a finite integral domain. In order to prove that D is a field we 
must 

1. Produce an element 1 eD such that a\ = a for every a e D. 

2. For every element a ^ 0 e D produce an element b e D such that 
ab = 1. 

Let x 1} x 2 ,..., x n be all the elements of D, and suppose that a ^ 0 e D. 
Consider the elements x x a, x 2 a, . . ., x n a; they are all in D. We claim that 
they are all distinct! For suppose that x t a = xya for i^j; then (x t — Xj)a = 0. 
Since D is an integral domain and a ^ 0, this forces x t — Xj = 0, and 
so x t = Xj, contradicting i ^ j. Thus x x a, x 2 a, . . ., x n a are n distinct 
elements lying in D, which has exactly n elements. By the pigeonhole 
principle these must account for all the elements of D; stated otherwise, 
every element jy e D can be written as x- x a for some x t . In particular, since 
a e D, a = x iQ a for some x io e D. Since D is commutative, a = x i( a = 
dx io . We propose to show that x io acts as a unit element for every element 
of D. For, if y e D, as we have seen, y — x t a for some x i e D, and so 
yXi o = (x^x^ = Xi(ax io ) = x t a = y. Thus x io is a unit element for D and 
we write it as 1. Now 1 e D, so by our previous argument, it too is realizable 
as a multiple of a; that is, there exists a b e D such that 1 = ba. The 
lemma is now completely proved. 

COROLLARY If p is a prime number then J p , the ring of integers mod p, is a 
field. 

Proof. By the lemma it is enough to prove that J p is an integral domain, 
since it only has a finite number of elements. If a, b e J p and ab = 0, 
then p must divide the ordinary integer ab, and so p, being a prime, must 
divide a or b. But then either a = 0 mod p or b = 0 mod p, hence in 
J p one of these is 0. 

The corollary above assures us that we can find an infinity of fields 
having a finite number of elements. Such fields are called finite fields. The 
fields J p do not give all the examples of finite fields; there are others. In 
fact, in Section 7.1 we give a complete description of all finite fields. 

We point out a striking difference between finite fields and fields such as 
the rational numbers, real numbers, or complex numbers, with which we 
are more familiar. 

Let F be a finite field having q elements (if you wish, think of J p with its 
p elements). Viewing F merely as a group under addition, since F has q 
elements, by Corollary 2 to Theorem 2.4.1, 

a + a+ '- ' + a = qa = 0 
#-times 
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for any a e F. Thus, in F, we have qa = 0 for some positive integer q, even 
if a # 0. This certainly cannot happen in the field of rational numbers, 
for instance. We formalize this distinction in the definitions we give below. 
In these definitions, instead of talking just about fields, we choose to widen 
the scope a little and talk about integral domains. 

DEFINITION An integral domain D is said to be of characteristic 0 if the 
relation ma = 0, where a # 0 is in D, and where m is an integer, can hold 
only if m = 0 . 

The ring of integers is thus of characteristic 0 , as are other familiar rings 
such as the even integers or the rationals. 

DEFINITION An integral domain D is said to be of finite characteristic if 
there exists a positive integer m such that ma = 0 for all a e D. 

If D is of finite characteristic, then we define the characteristic of D to be 
the smallest positive integer p such that pa = 0 for all a e D. It is not too 
hard to prove that if D is of finite characteristic , then its characteristic is a prime 
number (see Problem 6 below). 

As we pointed out, any finite field is of finite characteristic. However, an 
integral domain may very well be infinite yet be of finite characteristic (see 
Problem 7). 

One final remark on this question of characteristic: Why define it for 
integral domains, why not for arbitrary rings? The question is perfectly 
reasonable. Perhaps the example we give now points out what can happen 
if we drop the assumption “integral domain.” r 

Let R be the set of all triples {a, b, c ), where a e J 2 , the integers mod 2, 
^ e 73 ) the integers mod 3, and c is any integer. We introduce a + and a • 
to make of R a ring. We do so by defining (a l5 b u c x ) + (a 2 , b 2 , c 2 ) = 

( a i + a 2 , b y + b 2 , c l + c 2 ) and {a x , b x , <q) • (a 2 , b 2 , c 2 ) = (a x a 2 , b x b 2 , c x c 2 ). 

It is easy to verify that R is a commutative ring. It is not an integral domain 
since (1, 2, 0) • (0, 0, 7) = (0, 0, 0), the zero-element of R. Note that in R, 

2(1, 0 , 0 ) = ( 1 , 0 , 0 ) + ( 1 , 0 , 0 ) = ( 2 , 0 , 0 ) = ( 0 , 0 , 0 ) since addition in 

the first component is in J 2 . Similarly 3(0, 1, 0) = (0, 0, 0). Finally, for 
no positive integer m is m( 0 , 0 , 1 ) = ( 0 , 0 , 0 ). 

Thus, from the point of view of the definition we gave above for charac¬ 
teristic, the ring R, which we just looked at, is neither fish nor fowl. The 
definition just doesn’t have any meaning for R. We could generalize the 
notion of characteristic to arbitrary rings by doing it locally, defining it 
relative to given elements, rather than globally for the ring itself. We say 
that R has n-torsion, n > 0, if there is an element a =£ 0 in R such that 
na = 0, and ma =£ 0 for 0 < m < n. For an integral domain D, it turns 
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out that if D has w-torsion, even for one n > 0, then it must be of finite 

characteristic (see Problem 8). 

Problems 

R is a ring in all the problems. 

1. If a, b, c, d e R, evaluate ( a + b){c + d). 

2. Prove that if a, b e R, then {a + b) 2 = a 2 + ab + ba + b 2 , where 
by x 2 we mean xx. 

3. Find the form of the binomial theorem in a general ring; in other words, 
find an expression for ( a + b) n , where n is a positive integer. 

4. If every x e R satisfies x 2 = x, prove that R must be commutative. 
(A ring in which x 2 = x for all elements is called a Boolean ring.) 

5. If R is a ring, merely considering it as an abelian group under its 
addition, we have defined, in Chapter 2, what is meant by na, where 
a e R and n is an integer. Prove that if a, b e R and n, m are integers, 
then ( na){mb) = ( nm){ab ). 

6. If D is an integeral domain and D is of finite characteristic, prove that 
the characteristic of D is a prime number. 

7. Give an example of an integral domain which has an infinite number 
of elements, yet is of finite characteristic. 

8. If D is an integral domain and if na = 0 for some a ^ 0 in D and 
some integer n ^ 0, prove that D is of finite characteristic. 

9. If R is a system satisfying all the conditions for a ring with unit ele¬ 
ment with the possible exception of a + b = b + a, prove that the axiom 
a + b = b + a must hold in R and that R is thus a ring. {Hint: 
Expand {a + b){ 1 + 1) in two ways.) 

10. Show that the commutative ring D is an integral domain if and only 
if for a, b, c e D with a ^ 0 the relation ab = ac implies that b = c. 

11. Prove that Lemma 3.2.2 is false if we drop the assumption that the 
integral domain is finite. 

12. Prove that any field is an integral domain. 

13. Useing the pigeonhole principle, prove that if m and n are relatively 
prime integers and a and b are any integers, there exists an integer x 
such that x = a mod m and x = b mod n. {Hint: Consider the re¬ 
mainders of a, a + m, a + 2m, . . . , a + {n — 1 )m on division by n.) 

14. Using the pigeonhole principle, prove that the decimal expansion of 
a rational number must, after some point, become repeating. 
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3.3 Homomorphisms 

In studying groups we have seen that the concept of a homomorphism 
turned out to be a fruitful one. This suggests that the appropriate analog 
for rings could also lead to important ideas. To recall, for groups a homo¬ 
morphism was defined as a mapping such that (f)(ab) = 0(a)0(£). Since 
a ring has two operations, what could be a more natural extension of this 
type of formula than the 

DEFINITION A mapping 0 from the ring R into the ring R' is said to be a 
homomorphism if 

1. 0(a + b) = 0(a) + 0(6), 

2. (f)(ab) = 0(a)0(6), 

for all a, b e R. 

As in the case of groups, let us again stress here that the + and • occurring 
on the left-hand sides of the relations in 1 and 2 are those of R , whereas the 
+ and • occurring on the right-hand sides are those of R'. 

A useful observation to make is that a homomorphism of one ring, R , 
into another, R', is, if we totally ignore the multiplications in both these 
rings, at least a homomorphism of R into R' when we consider them as 
abelian groups under their respective additions. Therefore, as far as 
addition is concerned, all the properties about homomorphisms of groups 
proved in Chapter 2 carry over. In particular, merely restating Lemma 
2.7.2 for the case of the additive group of a ring yields for us 

LEMMA 3.3.1 If 0 is a homomorphism of R into R', then 

1. 0(0) = 0. 

2. 0(— a) = — 0(a) for every a e R. 

A word of caution: if both R and R' have the respective unit elements 
1 and T for their multiplications it need not follow that 0(1) = T. 
However, if R' is an integral domain, or if R' is arbitrary but 0 is onto, then 
0(1) = T is indeed true. 

In the case of groups, given a homomorphism we associated with this 
homomorphism a certain subset of the group which we called the kernel of 
the homomorphism. What should the appropriate definition of the kernel 
of a homomorphism be for rings? After all, the ring has two operations, 
addition and multiplication, and it might be natural to ask which of these 
should be singled out as the basis for the definition. However, the choice 
is clear. Built into the definition of an arbitrary ring is the condition that 
the ring forms an abelian group under addition. The ring multiplication 
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was left much more unrestricted, and so, in a sense, much less under our 
control than is the addition. For this reason the emphasis is given to the 
operation of addition in the ring, and we make the 

DEFINITION If (f) is a homomorphism of R into R' then the kernel of p, 
I(f, is the set of all elements a e R such that p(a ) = 0, the zero-element 
of R'. 

LEMMA 3.3.2 If p is a homomorphism of R into R' with kernel I(f, then 

1. I((f) is a subgroup of R under addition. 

2. If a e 1(f) and r e R then both ar and ra are in 1(f). 

Proof. Since (f) is, in particular, a homomorphism of R, as an additive 
group, into R', as an additive group, (1) follows directly from our results in 
group theory. 

To see (2), suppose that a e I(f, r e R. Then p(a) = 0 so that (f)(ar) = 
<j)(a)<j)(r) = 0 <j)(r) = 0 by Lemma 3.2.1. Similarly p(ra) = 0. Thus 
by defining property of I(f both ar and ra are in I(f. 

Before proceeding we examine these concepts for certain examples. 

Example 3.3.1 Let R and R' be two arbitrary rings and define <p(a) — 0 
for all a e R. Trivially p is a homomorphism and I(f = R. 0 is called 
the zero-homomorphism. 

Example 3.3.2 Let R be a ring, R' = R and define <p(x) — x for every 
x e R. Clearly (f) is a homomorphism and I(f consists only of 0. 

Example 3.3.3 Let J(yj 2) be all real numbers of the form m + »V 2 
where m, n are integers; J(\J 2) forms a ring under the usual addition and 
multiplication of real numbers. (Verify!) Define <f):J(\/ 2) -*■ J(yj 2) by 
(f)(m + n\J 2) = m — «%/2. p is a homomorphism of J(\j 2) onto J(V 2) 
and its kernel I(f, consists only of 0. (Verify!) 

Example 3.3.4 Let J be the ring of integers, J n , the ring of integers 
modulo n. Define <p:J -*■ J n by (p(a) = remainder of a on division by n. 
The student should verify that (j) is a homomorphism of J onto J n and that 
the kernel, I(f, of (f) consists of all multiples of n. 

Example 3.3.5 Let R be the set of all continuous, real-valued functions 
on the closed unit interval. R is made into a ring by the usual addition and 
multiplication of functions; that it is a ring is a consequence of the fact 
that the sum and product of two continuous functions are continuous 
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functions. Let F be the ring of real numbers and define (jy.R -» F by 
2 )- 0 is then a homomorphism of R onto F and its kernel 
consists of all functions in R vanishing at a; = j. 

All the examples given here have used commutative rings. Many 
}: beautiful examples exist where the rings are noncommutative but it would 
| be premature to discuss such an example now. 

I 

| DEFINITION A homomorphism of R into R' is said to be an isomorphism 
I if it is a one-to-one mapping. 

I : 

I 

I DEFINITION Two rings are said to be isomorphic if there is an isomorphism 
of one onto the other. 

The remarks made in Chapter 2 about the meaning of an isomorphism 
I and of the statement that two groups are isomorphic carry over verbatim 
I to rings. Likewise, the criterion given in Lemma 2.7.4 that a homomorphism 
I be an isomorphism translates directly from groups to rings in the form 

I' 

I LEMMA 3.3.3 The homomorphism p of R into R' is an isomorphism if and 
f only if I (0) = (0). 

13.4 Ideals and Quotient Rings 

| 

| Once the idea of a homomorphism and its kernel have been set up for rings, 

I based on our experience with groups, it should be fruitful to carry over 
| some analog to rings of the concept of normal subgroup. Once thi§‘ is 
I achieved, one would hope that this analog would lead to a construction in 
\ rings like that of the quotient group of a group by a normal subgroup. 

Finally, if one were an optimist, one would hope that the homomorphism 
theorems for groups would come over in their entirety to rings. 

Fortunately all this can be done, thereby providing us with an incisive 
technique for analyzing rings. 

The first business at hand, then, seems to be to define a suitable “normal 
subgroup” concept for rings. With a little hindsight this is not difficult. 

If you recall, normal subgroups eventually turned out to be nothing else 
than kernels of homomorphisms, even though their primary defining 
conditions did not involve homomorphisms. Why not use this observation 
as the keystone to our definition for rings? Lemma 3.3.2 has already 
provided us with some conditions that a subset of a ring be the kernel of a 
homomorphism. We now take the point of view that, since no other in¬ 
formation is at present available to us, we shall make the conclusions of 
Lemma 3.3.2 as the starting point of our endeavor, and so we define 
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DEFINITION A nonempty subset U of R is said to be a (two-sided) ideal 
of Rif 

1. U is a subgroup of R under addition. 

2. For every u e U and r e R, both ur and ru are in U. 

Condition 2 asserts that U “swallows up” multiplication from the right 
and left by arbitrary ring elements. For this reason U is usually called a 
two-sided ideal. Since we shall have no occasion, other than in some of the 
problems, to use any other derivative concept of ideal, we shall merely use 
the word ideal, rather than two-sided ideal, in all that follows. 

Given an ideal U of a ring R, let R/U be the set of all the distinct cosets 
of U in R which we obtain by considering U as a subgroup of R under 
addition. We note that we merely say coset, rather than right coset or left 
coset; this is justified since R is an abelian group under addition. To restate 
what we have just said, R/ U consists of all the cosets, a + U, where as R. 
By the results of Chapter 2, R/U is automatically a group under addition; 
this is achieved by the composition law (a + U) + (b + U) = (a + b) + U. 
In order to impose a ring structure on R/U we must define, in it, a multi¬ 
plication. What is more natural than to define ( a + U)(b + U) = 
ab + U? However, we must make sure that this is meaningful. Otherwise 
put, we are obliged to show that if a + U = a' + U and b + U — b' + U, 
then under our definition of the multiplication, ( a + U)(b + U ) = 

( a' + U)(b’ + U). Equivalently, it must be established that ab + U = 
a'b' + U. To this end we first note that since a + U = a’ + U, 
a = a' + u x , where u x e U; similarly b = b’ + u 2 where u 2 e U. Thus 
ab = ( a' + u x )(b + u 2 ) = a'b' + u x b' + a'u 2 + u x u 2 ; since U is an ideal of 
R, u x b' e U, a'u 2 e U, and u x u 2 e U. Consequently u x b’ + a'u 2 + u x u 2 = 
u 3 e U. But then ab = a'b' + u 3 , from which we deduce that ab + U = 
a'b' + u 3 + U, and since u 3 e U, u 3 + U — U. The net consequence 
of all this is that ab + U = a'b' + U. We at least have achieved the 
principal step on the road to our goal, namely of introducing a well-defined 
multiplication. The rest now becomes routine. To establish that R/U is a 
ring we merely have to go through the various axioms which define a ring 
and check whether they hold in Rj U. All these verifications have a certain 
sameness to them, so we pick one axiom, the right distributive law, and 
prove it holds in R/U. The rest we leave to the student as informal exercises. 
If X = a + U, Y = b + U, Z = c + U are three elements of R/U, 

where a, b, c e R, then (.X + Y)Z = ((a + U) + (b + U))(c + U) = 

((a + b) + U)(c + U) — {a + b)c + U = ac + be + U = {ac + U) + 

(be + U) = (a + U)(c + U) + (b + U)(c + U) = XZ + YZ. 

R/U has now been made into a ring. Clearly, if R is commutativfe then 
so is R/U, for (a + U)(b + U) = ab + U = ba + U — (b + U)(a + U). 
(The converse to this is false.) If R has a unit element 1, then RjU has a 
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unit element 1 + U. We might ask: In what relation is RjU to R ? With 
the experience we now have in hand this is easy to answer. There is a 
homomorphism </> of R onto RjU given by = a + U for every a e R, 
whose kernel is exactly U. (The reader should verify that <j> so defined is a 
homomorphism of R onto RjU with kernel U.) 

We summarize these remarks in 

LEMMA 3.4.1 If U is an ideal of the ring R, then R/U is a ring and is a 
homomorphic image of R. 

With this construction of the quotient ring of a ring by an ideal satisfactorily 
accomplished, we are ready to bring over to rings the homomorphism 
theorems of groups. Since the proof is an exact verbatim translation of that 
for groups into the language of rings we merely state the theorem without 
proof, referring the reader to Chapter 2 for the proof. 

THEOREM 3.4.1 Let R, R' be rings and (j) a homomorphism of R onto R' with 
kernel U. Then R' is isomorphic to RjU. Moreover there is a one-to-one correspondence 
between the set of ideals of R' and the set of ideals of R which contain U. This 
correspondence can be achieved by associating with an ideal W' in R' the ideal W in 
R defined by W — {x e R | 4>(x) e W'}. With W so defined , RjW is isomorphic 
toR'jW'. 

Problems 

1. If U is an ideal of R and 1 e U, prove that U = R. 

2. If F is a field, prove its only ideals are (0) and F itself. 

3. Prove that any homomorphism of a field is either an isomorphism or 
takes each element into 0. 

4. If R is a commutative ring and a e R, 

(a) Show that aR = {ar | r e /?} is a two-sided ideal of R. 

(b) Show by an example that this may be false if R is not commutative. 

5. If U, V are ideals of R, let U + V = {u + v | u e U, v e V}. Prove 
that U + V is also an ideal. 

6. If U, V are ideals of R let UV be the set of all elements that can be 
written as finite sums of elements of the form uv where u e U and 
v 6 V. Prove that UV is an ideal of R. 

7. In Problem 6 prove that UV cz U n V. 

8. If R is the ring of integers, let U be the ideal consisting of all multiples 
of 17. Prove that if V is an ideal of R and R r> V r> U then either 
V — R or V = U. Generalize! 
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9. If U is an ideal of R, let r(U) = {x e R \ xu = 0 for all u e U}. 
Prove that r(U) is an ideal of R. 

10. If U is an ideal of R let = {x e R \ rx e U for every r e R}. 

Prove that [i? : U ] is an ideal of R and that it contains U. 

11. Let R be a ring with unit element. Using its elements we define a 
ring R by defining a@b = a + b+ 1, and a • b = ab + a + b, 
where a, b e R and where the addition and multiplication on the 
right-hand side of these relations are those of R. 

(a) Prove that R is a ring under the operations @ and •. 

(b) What acts as the zero-element of R? 

(c) What acts as the unit-element of R? 

(d) Prove that R is isomorphic to R. 

*12. In Example 3.1.6 we discussed the ring of rational 2x2 matrices. 
Prove that this ring has no ideals other than (0) and the ring itself. 

*13. In Example 3.1.8 we discussed the real quaternions. Using this as a 
model we define the quaternions over the integers mod p , p an odd 
prime number, in exactly the same way; however, now considering 
all symbols of the form a 0 + otf + a 2 j + oc 3 k, where a 0 , oq, a 2 , a 3 
are integers mod p. 

(a) Prove that this is a ring with p A elements whose only ideals are 
(0) and the ring itself. 

**(b) Prove that this ring is not a division ring. 

If R is any ring a subset L of R is called a left-ideal of R if 
1. L is a subgroup of R under addition. 

2 r £ R, a £ L implies ra £ L. 

(One can similarly define a right-ideal.) An ideal is thus simultaneously a 
left- and right-ideal of R. 

14. For a g R let Ra = {xa | x £ R}. Prove that Ra is a left-ideal of R. 

15. Prove that the intersection of two left-ideals of R is a left-ideal of R. 

16. What can you say about the intersection of a left-ideal and right-ideal 
of R? 

17. If R is a ring and a e R let r(a) = {x e R \ ax ■= 0}. Prove that 
r{a) is a right-ideal of R. 

18. If R is a ring and L is a left-ideal of R let X{L) = {# £ R | xa — 0 for 
all a £ L }. Prove that X(L) is a two-sided ideal of R. 

*19. Let R be a ring in which x 3 = x for every x e R. Prove that R is a 
commutative ring. 

20. If R is a ring with unit element 1 and p is a homomorphism of R onto 
R' prove that 0(1) is the unit element of R'. 
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21. If R is a ring with unit element 1 and 0 is a homomorphism of R into 
an integral domain R' such that 7(0) ^ R, prove that 0(1) is the unit 
element of R'. 


3.5 More Ideals and Quotient Rings 

We continue the discussion of ideals and quotient rings. 

Let us take the point of view, for the moment at least, that a field is the 
most desirable kind of ring. Why? If for no other reason, we can divide in 
a field, so operations and results in a field more closely approximate our 
experience with real and complex numbers. In addition, as was illustrated 
by Problem 2 in the preceding problem set, a field has no homomorphic 
images other than itself or the trivial ring consisting of 0. Thus we cannot 
simplify a field by applying a homomorphism to it. Taking these remarks 
into consideration it is natural that we try to link a general ring, in some 
fashion, with fields. What should this linkage involve? We have a machinery 
whose component parts are homomorphisms, ideals, and quotient rings. 
With these we will forge the link. 

But first we must make precise the rather vague remarks of the preceding 
paragraph. We now ask the explicit question: Under what conditions is the 
homomorphic image of a ring a field? For commutative rings we give a 
complete answer in this section. 

Essential to treating this question is the converse to the result of Problem 
2 of the problem list at the end of Section 3.4. 


LEMMA 3.5.1 Let R be a commutative ring with unit element whose only ideals 
are (0) and R itself. Then R is a field. 

Proof. In order to effect a proof of this lemma for any a ^ 0 e R we 
must produce an element b ^ 0 e R such that ab = 1. 

So, suppose that a ^ 0 is in R. Consider the set Ra = {xa | x e /?}. 
We claim that Ra is an ideal of R. In order to establish this as fact we must 
show that it is a subgroup of R under addition and that if u e Ra and 
r e R then ru is also in Ra. (We only need to check that ru is in Ra for 
then ur also is since ru = ur.) 

Now, if u, v e Ra, then u = r x a, v = r 2 a for some r x , r 2 e R. Thus 
u + v = r x a + r 2 a = (r x + r 2 )a e Ra\ similarly —u = — r x a = ( — r x )aeRa. 
Hence Ra is an additive subgroup of R. Moreover, if r e R, ru = r{r x a) = 
{rr x )a e Ra. Ra therefore satisfies all the defining conditions for an ideal 
of R, hence is an ideal of R. (Notice that both the distributive law and 
associative law of multiplication were used in the proof of this fact.) 

By our assumptions on R, Ra = (0) or Ra = R. Since 0 ^ a = \a e Ra, 
Pa ^ (0); thus we are left with the only other possibility, namely that 
lui = R . This last equation states that every element in R is a multiple of 
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a by some element of R. In particular, lei? and so it can be realized as a 
multiple of a; that is, there exists an element b e R such that ba = 1. 
This completes the proof of the lemma. 

DEFINITION An ideal M ^ R in a ring R is said to be a maximal ideal of 
R if whenever U is an ideal of R such that M a U c= R, then either R — JJ 
or M = U. 

In other words, an ideal of R is a maximal ideal if it is impossible to 
squeeze an ideal between it and the full ring. Given a ring R there is no 
guarantee that it has any maximal ideals! If the ring has a unit element 
this can be proved, assuming a basic axiom of mathematics, the so-called 
axiom of choice. Also there may be many distinct maximal ideals in a 
ring R; this will be illustrated for us below in the ring of integers. 

As yet we have made acquaintance with very few rings. Only by con¬ 
sidering a given concept in many particular cases can one fully appreciate 
the concept and its motivation. Before proceeding we therefore examine 
some maximal ideals in two specific rings. When we come to the discussion 
of polynomial rings we shall exhibit there all the maximal ideals. 

Example 3.5.1 Let R be the ring of integers, and let U be an ideal of R. 
Since U is a subgroup of R under addition, from our results in group theory, 
we know that U consists of all the multiples of a fixed integer n 0 ; we write 
this as U = ( n 0 ). What values of n 0 lead to maximal ideals? 

We first assert that if p is a prime number then P = ( p ) is a maximal 
ideal of R. For if U is an ideal of R and U => P, then U = ( n 0 ) for some 
integer n 0 . Since p e P c U, p = mn 0 for some integer m\ because p is a 
prime this implies that n 0 = 1 or n 0 = p. If n 0 = p, then P c= U = 

( n 0 ) c= P , so that U = P follows; if n 0 = 1, then 1 e U, hence r = 1 reU 
for all r e R whence U = R follows. Thus no ideal, other than R or P 
itself, can be put between P and R, from which we deduce that P is maximal. 

Suppose, on the other hand, that M = ( n 0 ) is a maximal ideal of R. 
We claim that n 0 must be a prime number, for if n 0 = ab, where a , b are 
positive integers, then U = (a) => M, hence U — R or U = M. If U = R, 
then a = 1 is an easy consequence; if U = M, then a e M and so a = rtiQ 
for some integer r, since every element of M is a multiple of n 0 . But then 
n 0 = ab = rn 0 b, from which we get that rb = 1 , so that b = 1 , n 0 = a. 
Thus n 0 is a prime number. 

In this particular example the notion of maximal ideal comes alive—it 
corresponds exactly to the notion of prime number. One should not, 
however, jump to any hasty generalizations; this kind of correspondence 
does not usually hold for more general rings. 
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t Example 3.5.2 Let R be the ring of all the real-valued, continuous 
i functions on the closed unit interval. (See Example 3.3.5.) Let 

ft/ 

| M= {f(x)eR |/(i) = 0}. 

lM is certainly an ideal of R. Moreover, it is a maximal ideal of R, for if the 
. ideal U contains M and U # M, then there is a function g(x ) e U, 

g{x) 4 M. Since g(x) $ M, g(j) — a # 0. Now h(x) = g(x) — a is such 

that h(f) = g(^) — a = 0, so that h(x) e M c U. But g(x) is also in U; 
therefore a = g(x) — h(x) e U and so 1 = aa -1 e U. Thus for any 

function t ( x) e R, t(x) = It(x) e U, in consequence of which U = R. 

M is therefore a maximal ideal of R. Similarly if y is a real number 0 < 
y < 1, then M y = {f (x) e R | f (y) = 0} is a maximal ideal of R. It 
can be shown (see Problem 4 at the end of this section) that every maximal 
ideal is of this form. Thus here the maximal ideals correspond to the points 
on the unit interval. 

Having seen some maximal ideals in some concrete rings we are ready 
to continue the general development with 

THEOREM 3.5.1 If R is a commutative ring with unit element and M is an 
ideal of R, then M is a maximal ideal of R if and only if RjM is a field. 

Proof. Suppose, first, that M is an ideal of R such that R/M is a field. 

Since R/M is a field its only ideals are (0) and R/M itself. But by Theorem 
3.4.1 there is a one-to-one correspondence between the set of ideals of 
R/M and the set of ideals of R which contain M. The ideal Moff? corre¬ 
sponds to the ideal (0) of R/M whereas the ideal R of R corresponds to 
the ideal R/M of R/M in this one-to-one mapping. Thus there is no ideal 
between M and R other than these two, whence M is a maximal ideal. 

On the other hand, if M is a maximal ideal of R, by the correspondence 
mentioned above R/M has only (0) and itself as ideals. Furthermore R/M 
is commutative and has a unit element since R enjoys both these properties. 

All the conditions of Lemma 3.5.1 are fulfilled for R/M so we can conclude, 
by the result of that lemma, that R/M is a field. 

We shall have many occasions to refer back to this result in our study of 
polynomial rings and in the theory of field extensions. 

Problems 

1. Let R be a ring with unit element, R not necessarily commutative, such 
that the only right-ideals of R are (0) and R. Prove that R is a division 
ring. 
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*2. Let R be a ring such that the only right ideals of R are (0) and R, 
Prove that either R is a division ring or that R is a ring with a prime 
number of elements in which ab = 0 for every a, b e R. 

3. Let J be the ring of integers, p a prime number, and (p) the ideal of 
J consisting of all multiples of p. Prove 

(a) Jl (p) is isomorphic to J p , the ring of integers mod p. 

(b) Using Theorem 3.5.1 and part (a) of this problem, that J p is a 
field. 

**4. Let R be the ring of all real-valued continuous functions on the closed 
unit interval. If M is a maximal ideal of R, prove that there exists a 
real number y, 0 < y < 1, such that M = M y = {/ (x) eR\f (y) = 0}. 

3.6 The Field of Quotients of an Integral Domain 

Let us recall that an integral domain is a commutative ring D with the 
additional property that it has no zero-divisors, that is, if ab = 0 for some 
a, b £ D then at least one of a or b must be 0. The ring of integers is, of 
course, a standard example of an integral domain. 

The ring of integers has the attractive feature that we can enlarge it to 
the set of rational numbers, which is a field. Gan we perform a similar 
construction for any integral domain? We will now proceed to show that 
indeed we can! 

DEFINITION A ring R can be imbedded in a ring R' if there is an isomorphism 
of R into R'. (If/? and R' have unit elements 1 and T we insist, in addition, 
that this isomorphism takes 1 onto T.) 

R' will be called an over-ring or extension of R if R can be imbedded in R'. 
With this understanding of imbedding we prove 

THEOREM 3.6.1 Every integral domain can be imbedded in a field. 

Proof. Before becoming explicit in the details of the proof let us take an 
informal approach to the problem. Let D be our integral domain; roughly 
speaking the field we seek should be all quotients afi, where a, b e D and 
b # 0. Of course in D, a/b may very well be meaningless. What should 
we require of these symbols a/b? Clearly we must have an answer to the 
following three questions: 

1. When is a/b = c/d ? 

2. What is (a/b) + (cjd) ? 

3. What is (a/b) (c/d) ? 

In answer to 1, what could be more natural than to insist that a/b = c/d 
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if and only if ad = be? As for 2 and 3, why not try the obvious, that is, 
define 



ad + be 
bd 


and 


a c 
b d 


ac 

bd 


In fact in what is to follow we make these considerations our guide. So 
let us leave the heuristics and enter the domain of mathematics, with 
precise definitions and rigorous deductions. 

Let Ji be the set of all ordered pairs (a, b) where a, b e D and b ^ 0. 
(Think of (< a , b) as ajb.) In Ji we now define a relation as follows: 


{a, b) ~ ( c , d) if and only if ad = be. 

We claim that this defines an equivalence relation on Ji. To establish this 
we check the three defining conditions for an equivalence relation for this 
particular relation. 


, 1. If ( a , b) eJi, then {a, b) ~ ( a , b) since ab — ba. 

I 2. If (a, b ), ( c , d) e Ji and (a, b) ~ ( c , d ), then ad — be, hence cb = da , 

I and so (c, d) ~ {a, b). 

| 3. If {a, b), ( c , rf), (e, f) are all in Ji and ( a , b) ~ (r, rf) and (r, a?) ~ 

then ad = be and cf = ofe. Thus bef = bde, and since = ad , 

it follows that adf = bde. Since £) is commutative, this relation becomes 
| afd - bed ; since, moreover, D is an integral domain and d ^ 0, this 
| relation further implies that af = be. But then ( a , 6) (e,f) and our 

| relation is transitive. 


f Let [a, b ] be the equivalence class in Ji of ( a , b ), and let be the set of 
| all such equivalence classes [a, b ] where a, b e D and b ^ 0. is the 

' candidate for the field we are seeking. In order to create out of F a field 
I we must introduce an addition and a multiplication for its elements and then 
j show that under these operations F forms a field. 

We first dispose of the addition. Motivated by our heuristic discussion at 
; the beginning of the proof we define 


\a, b] -t- [ c , d] — [ad + be, bd]. 

Since D is an integral domain and both b =£ 0 and d ^ 0 we have that 
bd Jz 0; this, at least, tells us that [ad + be, bd] e F. We now assert that 
this addition is well defined, that is, if [a, b] = [a', b'] and [c, d] = [c ', d'], 
then [a, b] + [c, d] = [a ', b'] + [c ', d']. To see that this is so, from 
[a, b] = [a', b'] we have that ab' — ba'; from [c, d] = [c ', d'] we have 
that cd' = dc'. What we need is that these relations force the equality of 
[a, b] + [c, d] and [a ', b'] + [c', d']. From the definition of addition this 
boils down to showing that [ad + be, bd] = [a'd' + b'c', b'd’], or, in equiva¬ 
lent terms, that {ad + be)b'd' = bd{a'd' + b'c'). Using ab’ = ba ', cd' = dc' 
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this becomes: (ad + bc)b'd' = adb'd' + bcb'd' = ab'dd' + bb'cd' = ba'dd' + 
bb'dc' = bd(a'd' + b'c'), which is the desired equality. 

Clearly [0, b] acts as a zero-element for this addition and [ — a,b] as the 
negative of [a, b]. It is a simple matter to verify that F is an abelian group 
under this addition. 

We now turn to the multiplication in F. Again motivated by our pre¬ 
liminary heuristic discussion we define [a, b] [c, d] = [ac, bd ]. As in the 
case of addition, since b ^ 0, d ^ 0, bd ^ 0 and so [ac, bd] e F. A com¬ 
putation very much in the spirit of the one just carried out, proves that if 
[a,b] = [a',b'] and [c, d] = [c', d'] then [a, b][c, d] = [a', b'][c', d']. One 
can now show that the nonzero elements of F (that is, all the elements 
[a, b ] where a ^ 0) form an abelian group under multiplication in which 
[d, d] acts as the unit element and where 

[c, d]' 1 = [d, c] (since c ^ 0, [d, c ] is in F). 


It is a routine computation to see that the distributive law holds in F. 
F is thus a field. 

All that remains is to show that D can be imbedded in F. We shall 
exhibit an explicit isomorphism of D into F . Before doing so we first notice 
that for x # 0, y * 0 in D, [ax, x] = [ay,y] because (ax) y = x(ay ); let us 
denote [ax, x] by [a, 1]. Define F by fi(a) = [a, 1] for every 

ae D. We leave it to the reader to verify that 0 is an isomorphism of D 
into F, and that if D has a unit element 1, then 0(1) is the unit element of.F. 
The theorem is now proved in its entirety. 

F is usually called the field of quotients of D. In the special case in which 
D is the ring of integers, the F so constructed is, of course, the field of 
rational numbers. 


Problems 

1. Prove that if [a, 4] = [ ab'] and [c, d] = [c‘, d'\ then [a, b][c, d] = 
[a', 4'][c', d']. 

2. Prove the distributive law in F. 

3. Prove that the mapping (f> :D -» F defined by (j)(a) = [a, 1] is an 
isomorphism of D into F. 

4. Prove that if K is any field which contains D then K contains a subfield 
isomorphic to F. (In this sense F is the smallest field containing F).) 

*5. Let R be a commutative ring with unit element. A nonempty subset 
S of R is called a multiplicative system if 

1. 0 $S. 

2. s t , s 2 e S implies that s t s 2 e S. 
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Let J( be the set of all ordered pairs (r, s) where r e R, s e S. In 
Jt define (r, s) ~ (r', s') if there exists an element s" e S such that 

s"(rs f — sr') = 0. 

(a) Prove that this defines an equivalence relation on Jt . 

Let the equivalence class of (r, s) be denoted by [r, s], and let R s be 
the set of all the equivalence classes. In R s define [r 1? Sj] + [r 2 , s 2 ] = 
[r t s 2 + r 2 s u s t s 2 ] and [r 1? Sl ][r 2 , j 2 ] = [r x r 2 , Sl s 2 ]. 

(b) Prove that the addition and multiplication described above are 
well defined and that R$ forms a ring under these operations. 

(c) Can R be imbedded in R s ? 

(d) Prove that the mapping (j>:R -> R s defined by (j)(a) = [as, j] is 
a homomorphism of R into R s and find the kernel of <f) . 

(e) Prove that this kernel has no element of S in it. 

(f) Prove that every element of the form [s l} s 2 ] (where s 1} s 2 e S) in 
R s has an inverse in R s . 

6. Let D be an integral domain, a, b e D. Suppose that a n = b n and 
a m = b m for two relatively prime positive integers m and n. Prove that 
a = b. 

7. Let R be a ring, possibly noncommutative, in which xy = 0 implies 
* = 0 °r y = 0. If a, b e R and a n = b n and a m = b m for two relatively 
prime positive integers m and n, prove that a — b. 

3.7 Euclidean Rings 

The class of rings we propose to study now is motivated by several existing 
examples the ring of integers, the Gaussian integers (Section 3.8), and 
polynomial rings (Section 3.9). The definition of this class is designed to 
incorporate in it certain outstanding characteristics of the three concrete 
examples listed above. 

DEFINITION An integral domain R is said to be a Euclidean ring if for 
every a ^ 0 in R there is defined a nonnegative integer d(a) such that 

1- For all a, b e R, both nonzero, d{a) < d(ab). 

2. For any a, b e R, both nonzero, there exist t, r e R such that a = tb + r 
where either r — 0 or d(r) < d(b). 

We do not assign a value to d[ 0). The integers serve as an example of a 
Euclidean ring, where d(a) = absolute value of a acts as the required 
^Unction. In the next section we shall see that the Gaussian integers also 
orm a Euclidean ring. Out of that observation, and the results developed 
Ui this part, we shall prove a classic theorem in number theory due to 
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Fermat, namely, that every prime number of the form 4n + 1 can be 
written as the sum of two squares. 

We begin with 

THEOREM 3.7.1 Let R be a Euclidean ring and let A be an ideal of R. Then 
there exists an element a 0 e A such that A consists exactly of all a 0 x as x ranges over R. 

Proof. If A just consists of the element 0, put a 0 — 0 and the conclusion 
of the theorem holds. 

Thus we may assume that A # (0); hence there is an a # 0 in A. Pick 
an a 0 e A such that d(a 0 ) is minimal. (Since d takes on nonnegative integer 
values this is always possible.) 

Suppose that a e A. By the properties of Euclidean rings there exist 
t, r e R such that a = ta 0 + r where r = 0 or d(r) < d(a 0 ). Since 
a 0 e A and A is an ideal of R, ta 0 is in A. Combined with a e A this results 
in a — ta 0 e A; but r — a — ta 0 , whence r e A. If r # 0 then dir) < d(a 0 ), 
giving us an element r in A whose af-value is smaller than that of a 0 , in 
contradiction to our choice of a 0 as the element in A of minimal d-v alue. 
Consequently r = 0 and a = ta 0 , which proves the theorem. 

We introduce the notation (a) = {xa \ x e R} to represent the ideal of 
all multiples of a. 

DEFINITION An integral domain R with unit element is a principal ideal 
ring if every ideal A in R is of the form A = ( a ) for some a e R. 

Once we establish that a Euclidean ring has a unit element, in virtue of 
Theorem 3.7.1, we shall know that a Euclidean ring is a principal ideal ring. 
The converse, however, is false; there are principal ideal rings which are 
not Euclidean rings. [See the paper by T. Motzkin, Bulletin of the American 
Mathematical Society, Vol. 55 (1949), pages 1142-1146, entitled “The 
Euclidean algorithm.”] 

COROLLARY TO THEOREM 3.7.1 A Euclidean ring possesses a unit 
element. 

Proof. Let R be a Euclidean ring; then R is certainly an ideal of R, so 
that by Theorem 3.7.1 we may conclude that R = ( u 0 ) for some u 0 e R- 
Thus every element in R is a multiple of u 0 . Therefore, in particular, 
Uq = u 0 c for some c e R. If a e R then a = xu 0 for some x e R, hence 
ac = ( xu 0 )c = x{u 0 c) = xu 0 = a. Thus c is seen to be the required unit 
element. 

DEFINITION If a # 0 and b are in a commutative ring R then a is said 
to divide b if there exists a c e R such that b = ac. We shall use the symbol 
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a | b to represent the fact that a divides b and a f b to mean that a does 
not divide b. 

The proof of the next remark is so simple and straightforward that we 
omit it. 

REMARK \.Ifa\ b and b | c then a | c. 

2. If a\b and a | c then a\ (b + c). 

3. If a | b then a | bxfor all x e R. 

DEFINITION If a, b g R then d £ R is said to be a greatest common divisor 

of a and b if 

1. d | a and d | b. 

2. Whenever c | a and c | b then c | d. 

We shall use the notation d = ( a , b ) to denote that d is a greatest common 
divisor of a and b. 

LEMMA 3.7.1 Let R be a Euclidean ring. Then any two elements a and b in 
R have a greatest common divisor d. Moreover d = Xa + pb for some X, p e R. 

Proof. Let A be the set of all elements ra + sb where r, s range over R. 
We claim that A is an ideal of R. For suppose that x,y e A; therefore 
* = r i a + s i b > = r 2 a + s 2 b > and so x ± y = (r x ± r 2 )a + (q + s 2 )b e A. 

Similarly, for any u e R, ux = u{r^a + sf) = ( urfa + (usfb e A. 

Since A is an ideal of R, by Theorem 3.7.1 there exists an element d e A 
such that every element in A is a mutiple of d. By dint of the fact that 
d e A and that every element of A is of the form ra + sb, d = Xa + ^jxb 
for some X, jx e R. Now by the corollary to Theorem 3.7.1, R has a unit 
element 1; thus a = la + Ob e A, b = Oa + lb e A. Being in A, they 
are both multiples of d, whence d | a and d | b. 

Suppose, finally, that c \ a and c\b; then c \ Xa and c | fib so that c 
certainly divides Xa + fib = d. Therefore d has all the requisite conditions 
for a greatest common divisor and the lemma is proved. 

DEFINITION Let R be a commutative ring with unit element. An 
clement a e R is a unit in R if there exists an element b e R such that ab = 1. 

Do not confuse a unit with a unit element! A unit in a ring is an element 
whose inverse is also in the ring. 

‘-EMMA 3.7.2 Let R be an integral domain with unit element and suppose that 
for a, b e R both a | b and b | a are true. Then a = ub , where u is a unit in R. 
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Proof. Since a | b, b = xa for some x e R; since b \ a, a = yb for some 
y e R. Thus b = x(yb) = ( xy)b; but these are elements of an integral 
domain, so that we can cancel the b and obtain xy = 1; y is thus a unit in 
R and a — yb, proving the lemma. 

DEFINITION Let R be a commutative ring with unit element. Two 
elements a and b in R are said to be associates if b = ua for some unit u in R. 

The relation of being associates is an equivalence relation. (Problem 1 
at the end of this section.) Note that in a Euclidean ring any two greatest 
common divisors of two given elements are associates (Problem 2). 

Up to this point we have, as yet, not made use of condition 1 in the 
definition of a Euclidean ring, namely that d(a) < d(ab ) for b ± 0. We 
now make use of it in the proof of 

LEMMA 3.7.3 Let R be a Euclidean ring and a, b e R. Ifb ^ 0 is not a unit 
in R, then d[a) < d ( ab ). 

Proof. Consider the ideal A = ( a) = {xa \ x e i?} of R. By condition 
1 for a Euclidean ring, d (a) < d (xa) for a: # 0 in R. Thus the rf-value of 
a is the minimum for the d- value of any element in A. Now ab e A; if 
d(ab) = d(a), by the proof used in establishing Theorem 3.7.1, since the 
d-value of ab is minimal in regard to A, every element in A is a multiple of 
ab. In particular, since a e A, a must be a multiple of ab; whence a = abx 
for some x e R. Since all this is taking place in an integral domain we 
obtain bx = 1. In this way b is a unit in R, in contradiction to the fact that 
it was not a unit. The net result of this is that d(a) < d(ab). 

DEFINITION In the Euclidean ring R a nonunit n is said to be a prime 
element of R if whenever n = ab, where a, b are in R, then one of a or b is a 
unit in R. 

A prime element is thus an element in R which cannot be factored in R 
in a nontrivial way. 

LEMMA 3.7.4 Let R be a Euclidean ring. Then every element in R is either a 
unit in R or can be written as the product of a finite number of prime elements of R- 

Proof. The proof is by induction on d(a ). 

If d(a) = a?(l) then a is a unit in R (Problem 3), and so in this case, the 
assertion of the lemma is correct. 

We assume that the lemma is true for all elements x in R such that 
d(x) < d(a). On the basis of this assumption we aim to prove it for a. 
This would complete the induction and prove the lemma. 
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If a is a prime element of R there is nothing to prove. So suppose that 
a = be where neither b nor c is a unit in R. By Lemma 3.7.3, d(b) < d(bc) = 
d(a) and d{c) < d{bc ) = d(a). Thus by our induction hypothesis b and c 
can be written as a product of a finite number of prime elements of R; 
b = ^ 1^2 ' ' ’ n tp c ~ ' ' ' n'm where the 7t’s and 7t'’s are prime elements 

of R • Consequently a = be = • • Tt n 7i\it' 2 ■ • • n' m and in this way a 

has been factored as a product of a finite number of prime elements. This 
completes the proof. 


DEFINITION In the Euclidean ring R, a and b in R are said to be relatively 
prime if their greatest common divisor is a unit of R. 


Since any associate of a greatest common divisor is a greatest common 
divisor, and since 1 is an associate of any unit, if a and b are relatively 
prime we may assume that ( a , b) — 1. 

LEMMA 3.7.5 Let R be a Euclidean ring. Suppose that for a, b, c e R, a | be 
but {a, b) = 1. Then a | c. 

Proof. As we have seen in Lemma 3.7.1, the greatest common divisor 
of a and b can be realized in the form la + fib. Thus by our assumptions, 
la + fib = 1. Multiplying this relation by c we obtain lac + fibc = c. 
Now a | lac, always, and a | nbc since a | be by assumption; therefore 
a | (lac + pibc ) = c. This is, of course, the assertion of the lemma. 

We wish to show that prime elements in a Euclidean ring play the same 
role that prime numbers play in the integers. If n in R is a prime element 
of R and a e R, then either n | a or (jz, a) = 1, for, in particular, (n, a ) 
is a divisor of n so it must be n or 1 (or any unit). If [n, a) = 1, one-half 
our assertion is true; if (n, a ) = n, since (n, a) \ a we get n | a, and the 
other half of our assertion is true. 


LEMMA 3.7.6 If it is a prime element in the Euclidean ring R and n \ ab 
where a, b e R then n divides at least one of a or b. 

Proof. Suppose that n does not divide a; then (n, a) = 1 . Applying 
Lemma 3.7.5 we are led to n | b. 

COROLLARY If n is a prime element in the Euclidean ring R and n\a x a 2 • • • a n 
then n divides at least one a x , a 2 ,.. .,a n . 

We carry the analogy between prime elements and prime numbers 
further and prove 
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THEOREM 3.7.2 (Unique Factorization Theorem) Let R be a Eu¬ 
clidean ring and a ^ 0 a nonunit in R. Suppose that a = 7ti7t 2 ’’’ ftn ~ 
n\n' 2 ■ ■ ■ it' m where the 7 r ; and n) are prime elements of R. Then n = m and each 
n 1 < i < n is an associate of some n), 1 < j < m and conversely each n\ 
is an associate of some n q . 

Proof. Look at the relation a = 7T 1 7r 2 ' • "7i n = n\n' 2 ' ■ •n' m .Butn i | 7Ti7T 2 * • -n n , 
hence n l \ n\n' 2 ■ ■ • n' m . By Lemma 3.7.6, n 1 mustdivide some n-; since n 1 and 
n '. are both prime elements of R and n x \ %\ they must be associates and 

= u^n^, where iq is a unit in R. Thus n l n 2 ■ • ■ n n = n\n 2 ■ • ■ n m = 
ufi y ri 2 • • • 7 t-_ 1 7 i- + 1 ■ • • n' m ; cancel off n l and we are left with n 2 ■ * ■ n„ = 
u{k' 2 • • • n' i - l n' i + l ■ ■ • n' m . Repeat the argument on this relation with n 2 . 
After n steps, the left side becomes 1, the right side a product of a certain 
number of n' (the excess of m over n). This would force n < m since the 
n' are not units. Similarly, m < n, so that n = m. In the process we have 
also showed that every n { has some %\ as an associate and conversely. 

Combining Lemma 3.7.4 and Theorem 3.7.2 we have that every nonzero 
element in a Euclidean ring R can be uniquely written (up to associates) as a product 
of prime elements or is a unit in R. 

We finish the section by determining all the maximal ideals in a Euclidean 
ring. 

In Theorem 3.7.1 we proved that any ideal A in the Euclidean ring R is of 
the form A = (a 0 ) where {a 0 ) = {xa Q We now ask: What con¬ 

ditions imposed on a 0 insure that A is a maximal ideal of R? For this 
question we have a simple, precise answer, namely 

LEMMA 3.7.7 The ideal A = (a Q ) is a maximal ideal of the Euclidean ring 
R if and only if a 0 is a prime element of R. 

Proof. We first prove that if a 0 is not a prime element, then A = (a 0 ) 
is not a maximal ideal. For, suppose that a 0 = be where b,ceR and 
neither b nor c is a unit. Let B = (b); then certainly a 0 e B so that A c= B. 
We claim that A # B and that B # R. 

If B = R then 1 e B so that 1 = xb for some x e R, forcing Ho be a 
unit in R, which it is not. On the other hand, if A = B then b e B = A 
whence b = xa 0 for some x e R. Combined with a 0 = be this results in 
a Q = xca 0 , in consequence of which xc = 1. But this forces c to be a unit 
in R, again contradicting our assumption. Therefore B is neither A nor R 
and since A c= B, A cannot be a maximal ideal of R. 

Conversely, suppose that a 0 is a prime element of R and that U is an 
ideal of R such that A = (a 0 ) a U a R. By Theorem 3.7.1, U■= ( u 0 )• 
Since a 0 e A a U = (u 0 ), a 0 = xu 0 for some x e R. But a 0 is a prime 
element of R, from which it follows that either x or u 0 is a unit in R. If Uq 
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is a unit in R then U — R (see Problem 5). If, on the other hand, x is a 
unit in R, then x~ x e R and the relation a 0 = xu 0 becomes u 0 = x~ 1 a 0 e A 
since A is an ideal of R. This implies that U c= A ; together with A cz U 
we conclude that U = A. Therefore there is no ideal of R which fits 
strictly between A and R. This means that A is a maximal ideal of R. 

Problems 

1. In a commutative ring with unit element prove that the relation a is 
an associate of b is an equivalence relation. 

2. In a Euclidean ring prove that any two greatest common divisors of 
a and b are associates. 

3. Prove that a necessary and sufficient condition that the element a in 
the Euclidean ring be a unit is that d(a) = c?(l). 

4. Prove that in a Euclidean ring ( a , b) can be found as follows: 

b = q 0 a + r l5 where d(r x ) < d{a) 
a = q 1 r 1 + r 2 , where d(r 2 ) < d(r x ) 
r i = <h r 2 + r 3 > where d(r 3 ) < d(r 2 ) 

r n-l = <Wn 

and r n = (a, b ). 

5. Prove that if an ideal U of a ring R contains a unit of R, then U = R. 

6. Prove that the units in a commutative ring with a unit element form 

an abelian group. T " 

7. Given two elements a, b in the Euclidean ring R their least common 
multiple c e R is an element in R such that a | c and b | c and such that 
whenever a | x and b | x for x e R then c | x. Prove that any two elements 
in the Euclidean ring R have a least common multiple in R. 

8. In Problem 7, if the least common multiple of a and b is denoted by 
[a, 6], prove that [a, b ] = ab/(a, b ). 

3.8 A Particular Euclidean Ring 

An abstraction in mathematics gains in substance and importance when, 
particularized to a specific example, it sheds new light on this example. 
We are about to particularize the notion of a Euclidean ring to a concrete 
ring, the ring of Gaussian integers. Applying the general results obtained 
about Euclidean rings to the Gaussian integers we shall obtain a highly 
nontrivial theorem about prime numbers due to Fermat. 
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Let J\i\ denote the set of all complex numbers of the form a + bi where 
a and b are integers. Under the usual addition and multiplication of com¬ 
plex numbers J[i] forms an integral domain called the domain of Gaussian 
integers. 

Our first objective is to exhibit J\i\ as a Euclidean ring. In order to do 
this we must first introduce a function d(x) defined for every nonzero 
element in J\i ] which satisfies 

1. d{x) is a nonnegative integer for every x ^ 0 e /[i]. 

2. d(x) < d(xy) for every y ^ 0 in J\i\. 

3. Given u, v e J[i ] there exist /, r e J[i] such that v = tu + r where 
r = 0 or d(r) < d(u). 

Our candidate for this function d is the following: if x = a + bi e J [z], 
then d(x) = a 2 + b 2 . The d(x) so defined certainly satisfies property 1; 
in fact, if a: ^ 0 e J\t] then d(x) > 1. As is well known, for any two com¬ 
plex numbers (not necessarily in J [z]) x,y, d(xy) = d(x)d(y); thus if a: 
and y are in addition in J[i] and y ^ 0, then since d{y) > 1, d(x) = 
rf(x)l < d(x)d(y) = d(xy), showing that condition 2 is satisfied. All our 
effort now will be to show that condition 3 also holds for this function d in 
J\i]. This is done in the proof of 

THEOREM 3.8.1 J[i] is a Euclidean ring. 

Proof. As was remarked in the discussion above, to prove Theorem 3.8.1 
we merely must show that, given x,ye J\i\ there exists t,re J\i\ such 
that y = tx + r where r = 0 or d (r) < d (x). 

We first establish this for a very special case, namely, where y is arbitrary 
in J\i] but where x is an (ordinary) positive integer n. Suppose that 
y = a + bi; by the division algorithm for the ring of integers we can find 
integers u, v such that a = un + u t and b — vn + v x where u x and v x are 
integers satisfying IwJ < \n and IflJ < \n. Let t — u + vi and r = u l + pp; 
thenjy = a + bi = un + u x + iyn + v x )i = (u + vi)n + u t + v x i = 
tn + r. Since d(r) = d(u t + v x i) = u t 2 + v 2 < n 2 \ 4 + n 2 \ 4 < n 2 = d(n ), 
we see that in this special case we have shown that y = tn + r with r = 0 
or d(r) < din). 

We now go to the general case; let x ^ 0 andjy be arbitrary elements 
in J\i\. Thus xx is a positive integer n where x is the complex conjugate of 
x. Applying the result of the paragraph above to the elements yx and n we 
see that there are elements t, r e J\i\ such that yx = tn + r with r = 0 
or d{r) < d{n). Putting into this relation n = xx we obtain d{yx — txx) < 
d{n) = d(xx); applying to this the fact that d{yx — txx) — d{y — tx)d(x) 
and d(xx) = d(x)d(x) we obtain that d{y — tx)d{x) < d(x)d(x). Since 
x 0, d(x) is a positive integer, so this inequality simplifies to d(y — tx) < 
d{x). We represent y = tx + r 0 , where r 0 =y — tx; thus t and r 0 are in 
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we saw above, r 0 = 0 or d(r 0 ) = d(y — tx) < d{x). This 
- proves the theorem. 

S Since J\j] has been proved to be a Euclidean ring, we are free to use the 
^results established about this class of rings in the previous section to the 
Euclidean ring we have at hand, J\i\. 

LEMMA 3.8.1 Let p be a prime integer and suppose that for some integer c 
relatively prime to p we can find integers x and y such that x 2 + y 2 = cp. Then 
P can be written as the sum of squares of two integers , that is, there exist integers 
I a and b such that p = a 2 + b 2 . 

Ip'. 

\ Proof. The ring of integers is a subring of/[i]. Suppose that the integer 
; p is also a prime element of J[i\. Since cp = x 2 + y 2 = (x +yi)(x — yi), 
| by Lemma 3.7.6, p | (x + yi) or p | (x — yi) in J\i\. But if p | {x + yi) then 
1 x + yi = p(u + vi) which would say that x = pu and y = pv so that p 
I also would divide * — yi. But then p 2 \ (x + yi) (x — yi) = cp from which we 
| would conclude that p \ c contrary to assumption. Similarly if p \ [x — yi). 

Thus p is not a prime element in J\i\ ! In consequence of this, 

If 

I P = (a + bi)(g + di) 

1 where a + bi and g + di are in J[i ] and where neither a + bi nor g + di 

| is a unit in J\I\. But this means that neither a 2 + b 2 = 1 norg 2 + d 2 — 1. 

f (See Problem 2.) From p = [a + bi)(g + di) it follows easily that/? = 
I, (a — bi)(g — di). Thus 

I P 2 = {a + bi)(g + di)(a - bi) (g - di) = (a 2 + b 2 )(g 2 + d 2 ). 

I 1 :. ■ 

Therefore (a 2 + b 2 ) \p 2 so a 2 + b 2 = 1, p or p 2 \ a 2 + b 2 # 1 since 

| a + bi is not a unit, in J\i \; a 2 + b 2 ^ p 2 , otherwise g 2 + d 2 = 1, con- 

| trary to the fact that g + di is not a unit in J{i\. Thus the only feasibility 
I left is that a 2 + b 2 = p and the lemma is thereby established. 

The odd prime numbers divide into two classes, those which have a 
remainder of 1 on division by 4 and those which have a remainder of 3 on 
’ division by 4. We aim to show that every prime number of the first kind 
: can be written as the sum of two squares, whereas no prime in the second 
f class can be so represented. 

. LEMMA 3.8.2 If p is a prime number of the form 4n + 1, then we can solve 
the congruence x 2 = — 1 mod p. 

Proof. Let x = 1 * 2 * 3 * • ■ (p — 1 )j2. Since p — 1 = 4 n, in this prod¬ 
uct for x there are an even number of terms, in consequence of which 


M 


and 


as 


151 




152 Ring Theory Ch. 3 


But/) — k = —k mod/), so that 

,2 - I i . o . . . P ~ 1 


x 2 = [\- 2 


( — 1 )( — 2 ) 


'P ~ 1 


1 -2 


/)—!/)+! 


2 2 

= (/)—!)!= — 1 mod/). 


(P ~ 1) 


We are using here Wilson’s theorem, proved earlier, namely that if p is 
a prime number (p — 1)! = —\{p). 

To illustrate this result, if p = 13, 

x = l-2-3-4'5-6 = 720 = 5 mod 13 and 5 2 = — 1 mod 13. 


THEOREM 3.8.2 (Fermat) If p is a prime number of the form 4 n + 1, 
then p = a 2 + b 2 for some integers a, b. 

Proof. By Lemma 3.8.2 there exists an x such that x 2 = — 1 mod p. 
The x can be chosen so that 0 <C x < p — 1 since we only need to use the 
remainder of x on division by p. We can restrict the size of x even further, 
namely to satisfy \x\ < p\ 2. For if a: > />/2, then y — p — x satisfies 
y 2 = — 1 mod/) but \y\ < p/2. Thus we may assume that we have an 
integer x such that \x\ < p/2 and x 2 + 1 is a multiple of p, say cp. Now 
cp = x 2 + 1 < p 2 /4 + 1 < p 2 , hence c < p and so p f c. Invoking 
Lemma 3.8.1 we obtain that p — a 2 + b 2 for some integers a and b, 
proving the theorem. 


Problems 

1. Find all the units in J\i\. 

2. If a + bi is not a unit of J\f\ prove that a 2 + b 2 > 1. 

3. Find the greatest common divisor in J\i\ of 

(a) 3 + 4 i and 4 — 3 i. (b) 11 +7 i and 18 — i. 

4. Prove that if p is a prime number of the form 4 n + 3, then there is 
no x such that x 2 = —l mod p. 

5. Prove that no prime of the form 4w + 3 can be written as a 2 + b 2 
where a and b are integers. 

6. Prove that there is an infinite number of primes of the form 4 n + 3. 
*7. Prove there exists an infinite number of primes of the form 4/z + 1. 

*8. Determine all the prime elements in J\i\. 

*9. Determine all positive integers which can be written as a sum of two 
squares (of integers). 
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3.9 Polynomial Rings 

Very early in our mathematical education—in fact in junior high school or 
early in high school itself—we are introduced to polynomials. For a seemingly 
endless amount of time we are drilled, to the point of utter boredom, in 
factoring them, multiplying them, dividing them, simplifying them. Facility 
in factoring a quadratic becomes confused with genuine mathematical 
talent. 

Later, at the beginning college level, polynomials make their appearance 
in a somewhat different setting. Now they are functions, taking on values, 
and we become concerned with their continuity, their derivatives, their 
integrals, their maxima and minima. 

We too shall be interested in polynomials but from neither of the above 
viewpoints. To us polynomials will simply be elements of a certain ring 
and we shall be concerned with algebraic properties of this ring. Our 
primary interest in them will be that they give us a Euclidean ring whose 
properties will be decisive in discussing fields and extensions of fields. 

Let F be a field. By the ring of polynomials in the indeterminate, x, written 
as F[x], we mean the set of all symbols a 0 + a t x + • • • + a n x n , where n 
can be any nonnegative integer and where the coefficients a u a 2 ,..., a n 
are all in F. In order to make a ring out of F[^] we must be able to recognize 
when two elements in it are equal, we must be able to add and multiply 
elements of so that the axioms defining a ring hold true for /'’[*]. 
This will be our initial goal. 

We could avoid the phrase “the set of all symbols” used above by intro¬ 
ducing an appropriate apparatus of sequences but it seems more desirable 
to follow a path which is somewhat familiar to most readers. 

DEFINITION If p{x ) = a 0 + a±x + • • • + a m x m and q(x) = b 0 + b t x + 

• • • + b n x n are in F'[*], then p(x) = q(x) if and only if for every integer 
i > 0, a t = b- x . 

Thus two polynomials are declared to be equal if and only if their corre¬ 
sponding coefficients are equal. 

definition If p(x) = a 0 + a t x + • • • + a m x m and q{x) = b 0 + b x x + 

• * * + b n xP are both in then p(x) + q(x) = c 0 + c t x + • • • + c t x l 

where for each i, c t = a t + b t . 

In other words, add two polynomials by adding their coefficients and 
collecting terms. To add 1 + x and 3 — 2x + x 2 we consider 1 + x as 
1 + x + Ox 2 and add, according to the recipe given in the definition, to 
obtain as their sum 4 — x + x 2 . 
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The most complicated item, and the only one left for us to define for 
F[x\, is the multiplication. 

DEFINITION If p{x) = a 0 + a x x + • • • + a m x m and q(x) = b 0 + b x x + 

• • • + b n x n , then p(x)q(x) = c 0 + c x x + ■ • • + where c t = a t b 0 + 

a t-\b x + a t _ 2 b 2 + • • • + a 0 b t . 

This definition says nothing more than: multiply the two polynomials 
by multiplying out the symbols formally, use the relation x a x p = x a+p , 
and collect terms. Let us illustrate the definition with an example: 

p(x) = 1 + x - x 2 , q(x) = 2 + x 2 + x 3 . 

Here a 0 = 1, a x = 1, a 2 = — 1, a 3 = a 4 = • • • = 0, and b 0 = 2, b x — 0, 

b 2 = 1, b 3 = b 4 = b 5 = • • • = 0. Thus 

c 0 — a 0 b 0 = 1.2 2, 

c x — a x b 0 + a 0 b x = 1.2 + 1.0 = 2, 

c 2 = a 2 b 0 + a x b x + a 0 b 2 = (—1)(2) + 1.0 + 1.1 = -1, 

c 2 =a 3 b 0 + a 2 b x + a x b 2 + a 0 b 3 = (0) (2) + ( —1)(0) + 1.1 + 1.1 = 2, 

£4 = £460 + ^3 b x + a 2 b 2 + a x b 3 + 

= (0) (2) + CO) (0) + (-1)(1) + (1)(1) + 1(0) = 0, 

c 5 = a 5 b 0 + a^b x + a 3 b 2 + a 2 b 3 + a x b A + a 0 b 5 

= ( 0 )( 2 ) + ( 0 )( 0 ) + ( 0 )( 1 ) + (- 1 )( 1 ) + ( 1 )( 0 ) + ( 0 )( 0 ) = - 1 , 

c 6 = a 6 b 0 + a 5 b x + a A b 2 + a 3 b 3 + a 2 b A + a x b 5 + a 0 b 6 

= (0) (2) + (0)(0) + (0)(1) + (0)(1) + (-l)CO) + (1)(0) + (1)(0) = 0 , 

cq ^ c g — • • • = 0. 

Therefore according to our definition, 

{l + X - x 2 )(2 + X 2 + X 3 ) = c 0 + c x x + • • • = 2 + 2 x - x 2 + 2 x 3 - x 5 . 

If you multiply these together high-school style you will see that you get 

the same answer. Our definition of product is the one the reader has always 
known. 

Without further ado we assert that F[x] is a ring with these operations, 
its multiplication is commutative, and it has a unit element. We leave the 
verification of the ring axioms to the reader. 

DEFINITION If f(x) = a 0 + a x x + • • • + a n x n ^ 0 and a n ^ 0 then 
the degree of f (x), written as deg/(x), is n. 

That is, the degree of f (x) is the largest integer i for which the ith co¬ 
efficient of f (x) is not 0. We do not define the degree of the zero poly¬ 
nomial. We say a polynomial is a constant if its degree is 0. The degree 
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function defined on the nonzero elements of F[¥] will provide us with the 
function d(x) needed in order that F[*] be a Euclidean ring. 

LEMMA 3.9.1 Iff{x), g(x) are two nonzero elements of F[*], then 

de g (/(*)£(*)) = deg/(*) + deg£(*). 

Proof. Suppose that f(x) = a 0 + a t x + • • • + a m x m and g(x) = b Q + 
b t x + • • • + b n x n and that a m ^ 0 and b n ^ 0. Therefore deg / (x) = m 
and deg g(x) = n. By definition, f (x) g(x) = c 0 + c t x + ■ • • + c k x k where 
c t = a t b 0 + a t _ x b x + • • • + a 1 b t _ l + a 0 b t . We claim that c m + n — 
a m b n ^ d and c ( = 0 for i > m + n. That c m + n — a m b n can be seen at a 
glance by its definition. What about c ( for i > m + n? c { is the sum of 
terms of the form af i_j; since i = j + (z — j) > m + n then either j > m 
or (z — j) > n. But then one of dj or b i _ j is 0, so that a j b i _ j = 0; since c t 
is the sum of a bunch of zeros it itself is 0 , and our claim has been 
established. Thus the highest nonzero coefficient of f (x)g(x) is c m + n , whence 
deg f(x)g(x) = m + n = deg/(*) + deg^(^). 

COROLLARY Iff(x), g(x) are nonzero elements in then deg / (x) < 

deg f(x)g{x). 

Proof. Since deg/ (x)g(x) = degf(x ) + deg g(x), and since deg g(x) > 
0 , this result is immediate from the lemma. 

COROLLARY F[x] is an integral domain. 

We leave the proof of this corollary to the reader. 

Since -F[*] is an integral domain, in light of Theorem 3.6.1 we .£an 
construct for it its field of quotients. This field merely consists of all quotients 
of polynomials and is called the field of rational functions in x over F. 

The function deg f (x) defined for all f (x) #0 in F[*] satisfies 

1 • deg f (at) is a nonnegative integer. 

2 - deg f (x) < deg f (x)g(x) for all g(x) # 0 in F[>]. 

In order for to be a Euclidean ring with the degree function acting as 
the ^-function of a Euclidean ring we still need that given f (at), g(x) e F [#], 
there exist t(x), r(x) in -F[*] such that f (x ) = t(x)g(x) + r(x) where either 
r (x) = 0 or deg r(x) < deg g(*). This is provided us by 

LEMMA 3.9.2 (The Division Algorithm) Given two polynomials f (x) 
and g(x) 7 ^ 0 in F[x], then there exist two polynomials t(x) and r(x) in F[x] such 
thatf{x) — t(x)g(x) + r{x) where r(x) = 0 or deg r{x) < deg^(Ar). 

Proof. The proof is actually nothing more than the “long-division” 
process we all used in school to divide one polynomial by another. 
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If the degree of f (x) is smaller than that ofg(*) there is nothing to prove, 
for merely put t(x) = 0, r(x) = fix), and we certainly have that f [x] = 
0g(#) + /(#) where deg f (x) < deg g(x) or f(x) = 0. 

So we may assume that f (x) = a 0 + a^x + • • • + a m x m and g(*) = b 0 + 
b x x + • • • + b„x n where a m ^ 0, b n # 0 and m > n. 

Let ffx) =f{x) - {ajb n )x m ~ n g(x); thus deg//*) < m - 1, so by 
induction on the degree of f [x] we may assume that ffx) = £i(*)g(*) + 
r(x) where r(x) = 0 or deg r(x) < deg g(x). But then/ (x) — ( a m lb„)x m ~ n g{x ) = 
hi x )g{ x ) + r i x )> f rom which, by transposing, we arrive at f{x) = 

(( a ml b n) xm ~ n + *l( x ))g( x ) + r ( X )- If We P Ut t( x ) = ( a ml b n) xm ~ n + *l(*) 
we do indeed have that f (x) = t(x)g(x) + r(x) where t{x), r{x) ef[x] 
and where r(x) =0 or deg r{x) < deg g(x). This proves the lemma. 

This last lemma fills the gap needed to exhibit F[^] as a Euclidean ring 
and we now have the right to say 

THEOREM 3.9.1 is a Euclidean ring. 

All the results of Section 3.7 now carry over and we list these, for our 
particular case, as the following lemmas. It could be very instructive for 
the reader to try to prove these directly, adapting the arguments used in 
Section 3.7 for our particular ring F[x\ and its Euclidean function, the 
degree. 

LEMMA 3.9.3 F[x] is a principal ideal ring. 

LEMMA 3.9.4 Given two polynomials f (x), g{x) in they have a greatest 
common divisor d{x) which can be realized as d{x) = X{x) f (x) + n(x)g(x). 

What corresponds to a prime element? 

DEFINITION A polynomial/>(*) in F[Y] is said to be irreducible over F if 
whenever p[x) = a{x)b{x) with a{x), b(x) 6 F[4 then one of a{x) or b{x) 
has degree 0 (i.e., is a constant). 

Irreducibility depends on the field; for instance the polynomial x 2 + 1 
is irreducible over the real field but not over the complex field, for there 
x 2 + 1 = (* + i) (x — i) where i 2 = —1. 

LEMMA 3.9.5 Any polynomial in can be written in a unique manner as a 
product of irreducible polynomials in F[^]. 

LEMMA 3.9.6 The ideal A = (p(x)) in is a maximal ideal if and only 
if p(x) is irreducible over F. 
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In Chapter 5 we shall return to take a much closer look at this field 
F[x\l{p(x)), but for now we should like to compute an example. 

Let F be the field of rational numbers and consider the polynomial 
p(x) = x 3 - 2 in /?[*]. As is easily verified, it is irreducible over F, whence 
F[x]l(x 3 — 2) is a field. What do its elements look like? Let A = (x 3 — 2), 
the ideal in -F[x] generated by x 3 — 2. 

Any element in F[x]/(x 3 - 2) is a coset of the form/(x) + A of the 
ideal A with f(x) in F[x]. Now, given any polynomial / (x) e^[x], by 
the division algorithm, /(x) = t{x){x 3 - 2) + r(x), where r(x) =0 or 
deg r(x) < deg (x 3 - 2) = 3. Thus r(x) = % + a x x + a 2 * 2 where a 0 , a x , 
a 2 are in F; consequently/(x) + A = a 0 + a x x + a 2 x z + t(x)(x 3 - 2) + 
A = a 0 + «jx + a 2 x 2 + A since t(x)(x 3 - 2) is in A, hence by the addi¬ 
tion and multiplication in F[x]l(x 3 - 2),/(x) + A = (a 0 + A) + 
a 1 (x + A) + a 2 (x + A) 2 . If we put t = x + A, then every element in 
F[x]l(x — 2) is of the form a 0 + a x t + a 2 t 2 with a 0 , a 1 , a 2 in F. What about 
t? Since t 3 - 2 = (x + A) 3 - 2 = x 3 - 2 + A = A = 0 (since A is 

the zero element of F[x]/(x 3 — 2)) we see that t 3 = 2. 

Also, if a 0 + a y t + a 2 t 2 = b 0 + b y t + b 2 t 2 , then (a 0 - b 0 ) + ( fll - bjt + 
(a 2 — b 2 )t = 0, whence ( a 0 — b 0 ) + ( a l — b t )x + ( a 2 — b 2 )x 2 is in 
A = (x 3 — 2). How can this be, since every element in A has degree at 
least 3? Only if a 0 — b 0 + [a x ~ b^x + ( a 2 — b 2 )x 2 = 0, that is, only 
if a 0 = b 0 , a x = b x , a 2 — b 2 . Thus every element in F[x]/(x 3 — 2) has 
a unique representation as a 0 + aF + a 2 t 2 where a 0 , a u a 2 eF. By Lemma 
3.9.6, F[x]l(x 3 — 2) is a field. It would be instructive to see this directly; 
all that it entails is proving that if a 0 + a x t + a 2 t 2 # 0 then it has an 
inverse of the form a + /ft + yt z . Hence we must solve for a, /?, y in the 
relation ( a 0 + a x t + a 2 t 2 )(a + /fr + yt 2 ) = 1, where not all of a 0 u x , a 2 

are 0. Multiplying the relation out and using t 3 = 2 we obtain 

( a o a + 2a 2 /? + 2fl 1 y) + ( a x cc + a o p + 2 a 2 y)t + (a 2 a + a x P + a 0 y)t 2 = 1; 
thus 

a 0 a + 2 a 2 fi + 2 a x y = 1, 
a x a + a 0 jS + 2 a 2 y = 0, 
a 2 a + a x P + a 0 y = 0. 

We can try to solve these three equations in the three unknowns a, /?, y. 
When we do so we find that a solution exists if and only if 

a 0 3 + 2d x 3 + 4g 2 2 — SdQd x d 2 =?£ 0. 

Therefore the problem of proving directly that -F[x]/(x 3 — 2) is a field 
boils down to proving that the only solution in mtioml numbers of 

a 0 3 + 2.d x 3 + 4a 2 3 = &d§d x d 2 


( 1 ) 
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is the solution a 0 — a 1 — a 2 = 0. We now proceed to show this. If a 
solution exists in rationals, by clearing of denominators we can show that 
a solution exists where a 0 , a u a 2 are integers. Thus we may assume that 
a 0 , d 2 are integers satisfying (1). We now assert that we may assume 
that d 0 , d l} d 2 have no common divisor other than 1, for if d 0 = b 0 d , 
a i = and d 2 = b 2 d, where d is their greatest common divisor, then 
substituting in (1) we obtain d 3 (b 0 3 + 2b 3 + 4 b 2 3 ) = d 3 (6b 0 b l b 2 ) > and so 
b 0 3 + 2b 1 3 + 4 b 2 3 = 6b 0 b 1 b 2 . The problem has thus been reduced to 
proving that (1) has no solutions in integers which are relatively prime. 
But then (1) implies that d 0 3 is even, so that d 0 is even; substituting d 0 = 2a 0 
in (1) gives us 4a 0 3 + d 3 + 2d 2 = §<x Q d x a 2 . Thus d l 3 , and so, aq is even; 
d l = 2ai. Substituting in (1) we obtain 2a 0 3 + 4a x 3 + d 2 = fio^a^. 
Thus d 2 3 , and so d 2 , is even! But then d 0 , d u d 2 have 2 as a common 
factor! This contradicts that they are relatively prime, and we have proved 
that the equation d 0 3 + 2d ) 3 + \d 2 = 6 d Q d^d 2 has no rational solution 
other than d 0 = d x = d 2 = 0. Therefore we can solve for a, /?, y and 
F[x]l(x 3 — 2) is seen, directly, to be a field. 

Problems 

1. Find the greatest common divisor of the following polynomials over 
F, the field of rational numbers: 

(a) x 3 — 6x 2 + x + 4 and * 5 — 6* + 1. 

(b) x 2 + 1 and x 6 + x 3 + x + 1. 

2. Prove that 

(a) x 2 + a: + 1 is irreducible over F, the field of integers mod 2. 

(b) x 2 + 1 is irreducible over the integers mod 7. 

(c) * 3 — 9 is irreducible over the integers mod 31. 

(d) * 3 - 9 is reducible over the integers mod 11. 

3. Let F, K be two fields F c: K and suppose f (*), g (*) Ef[^] are re¬ 
latively prime in /?[*]. Prove that they are relatively prime in 

4. (a) Prove that x 2 + 1 is irreducible over the field F of integers mod 11 

and prove directly that F[x]l(x 2 + 1) is a field having 121 elements, 
(b) Prove that x 2 + x + 4 is irreducible over F, the field of integers 
mod 11 and prove directly that F[x]/(x 2 + x + 4) is a field 
having 121 elements. 

*(c) Prove that the fields of part (a) and part (b) are isomorphic. 

5. Let F be the field of real numbers. Prove that F[x]/(x 2 + 1) is a field 
isomorphic to the field of complex numbers. 

*6. Define the derivdtive f (x) of the polynomial 
f(x) = d Q + d Y x + • • • + d n x* 
as f'( x ) — a t + 2d z x + 3d 2 x 2 + • • • 4- 
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Prove that if f ( x ) where F is the field of rational numbers, then 

f (x) is divisible by the square of a polynomial if and only if f (x) and 
f'(x) have a greatest common divisor d(x) of positive degree. 

7. If f (x) is in F\x], where F is the field of integers mod p, p a prime, 
and f (x) is irreducible over F of degree n prove that F[x]j(f(x)) is a 
field with p n elements. 

3.10 Polynomials over the Rational Field 

We specialize the general discussion to that of polynomials whose co¬ 
efficients are rational numbers. Most of the time the coefficients will 
actually be integers. For such polynomials we shall be concerned with their 
irreducibility. 

DEFINITION The polynomial /(#) = a 0 + a^x + • • • + a n x n , where the 
A)> a i> a 2 > • • • 3 a n are integers is said to be primitive if the greatest common 
divisor of a 0 , a l3 . . ., a n is 1. 

LEMMA 3.10.1 If f (x) and g(x) are primitive polynomials, then f (x)g{x) 

is a primitive polynomial. 

Proof. Let / (#) = a 0 + a^x + • • • + a n x n and g(x) = b 0 + b^x + • • • + 
b m x m . Suppose that the lemma was false; then all the coefficients of 
f(x)g(x) would be divisible by some integer larger than 1, hence by some 
prime number p. Since f (x) is primitive, p does not divide some coefficient 
a t . Let Oj be the first coefficient of f ( x ) which p does not divide. Similarly 
let b k be the first coefficient of g(x) which p does not divide. In f (x)g(x) 
the coefficient of x j+k , c j+k , is 

c j+k — a jbk + ( a j + i^fc-i + a j+2^k-2 + ''' + a j+k b 0 ) 

+ (a J _ l b k + i + aj_ 2 b k+2 + • • • + %b j+k ). (1) 

Now by our choice of b k ,p\b k _ 1 , b k _ 2 , ... so that p \ (a J+l b k _ 1 + a j+2 b k _ 2 + 

’ " + a j+kPo)- Similarly, by our choice of a } , p\ a j _ l , aj_ 2 , . . . so that 
P\( a j-i^k + i + a j- 2 ^k +2 + ••• + %b k+j ). By assumption, p | c j+k . Thus 
b Y (1), P I «A’ which is nonsense since p f aj and p f b k . This proves 
the lemma. 

DEFINITION The content of the polynomial f (*) = a 0 + a±x + • • • + 
a n*?> where the a’s are integers, is the greatest common divisor of the 
integers a 0 , a u . . ., a n . 

Clearly, given any polynomial p(x) with integer coefficients it can be 
written as p(x) = dq(x ) where d is the content of p(x) and where q(x) is a 
primitive polynomial. 
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THEOREM 3.10.1 (Gauss’Lemma) If the primitive polynomial f (x) can 
be factored as the product of two polynomials having rational coefficients, it can be 
factored as the product of two polynomials having integer coefficients. 

Proof. Suppose that f (x) = u{x)v(x) where u{x) and v{x) have rational 
coefficients. By clearing of denominators and taking out common factors 
we can then write f (x) = (ajb)yl(x)n(x) where a and b are integers and 
where both X(x) and fi(x) have integer coefficients and are primitive. 
Thus bf (x) = aX(x)n(x). The content of the left-hand side is b, since 
f (x) is primitive; since both X{x) and fi(x) are primitive, by Lemma 3.10.1 
X(x)/j,(x) is primitive, so that the content of the right-hand side is a. There¬ 
fore a = b, (a/b) = 1, and f (*) = X(x)n(x) where X{x) and /i(x) have 
integer coefficients. This is the assertion of the theorem. 

DEFINITION A polynomial is said to be integer monic if all its coefficients 
are integers and its highest coefficient is 1. 

Thus an integer monic polynomial is merely one of the form x" + 
a 1 x n ~ 1 +••• + «„ where the a’s are integers. Clearly an integer monic 
polynomial is primitive. 

COROLLARY If an integer monic polynomial factors as the product of two non¬ 
constant polynomials having rational coefficients then it factors as the product of two 
integer monic polynomials. 

We leave the proof of the corollary as an exercise for the reader. 

The question of deciding whether a given polynomial is irreducible or not 
can be a difficult and laborious one. Few criteria exist which declare that a 
given polynomial is or is not irreducible. One of these few is the following 
result: 

THEOREM 3.10.2 (The Eisenstein Criterion) Let f {x) = a 0 + a x x + 
a 2 x 2 + • • ■ + a n x n be a polynomial with integer coefficients. Suppose that for 
some prime number p, p )( a n , p | a t , p | a 2 ,.. ■, p | a 0 ,p 2 f a 0 . Then f (x) is 
irreducible over the rationals. 

Proof. Without loss of generality we may assume that f (x) is primitive, 
for taking out the greatest common factor of its coefficients does not disturb 
the hypotheses, since p )( a n . If f (x) factors as a product of two rational 
polynomials, by Gauss’ lemma it factors as the product of two polynomials 
having integer coefficients. Thus if we assume that f (x) is reducible, then 

/ (*) = (b 0 + b x x + • • • + b r x r ) [c Q + c^x + • • • + c s x?), 
where the b’s and r’s are integers and where r > 0 and s > 0. Reading off 
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the coefficients we first get a 0 = b Q c Q . Since p | a 0 , p must divide one of 
bo or c o • Since p 2 X a 0 , p cannot divide both b 0 and c 0 . Suppose that p | b 0 , 
p X c o • Not all the coefficients b 0 , ... ,b r can be divisible by p; otherwise 
all the coefficients of/ ( x ) would be divisible by p , which is manifestly false 
since p X <*„• Let b k be the first b not divisible by p, k < r < n. Thus 
P I h-\ and the earlier b’s. But a k = b k c 0 + b k _ x c x + b k _ 2 c 2 + • • • + b 0 c k , 
and p \ a k ,p \ b k _ x , b k _ 2 ,. . ., b 0 , so that p | b k c 0 . However, p X c o>P X b k , 
which conflicts with p | b k c 0 . This contradiction proves that we could not 
have factored/ (x) and so/ (x) is indeed irreducible. 

Problems 

1. Let D be a Euclidean ring, F its field of quotients. Prove the Gauss 
Lemma for polynomials with coefficients in D factored as products of 
polynomials with coefficients in F. 

2. If p is a prime number, prove that the polynomial xP - p is irreducible 
over the rationals. 

3. Prove that the polynomial 1 + x + • • • + x p ~ 1 , where p is a prime 
number, is irreducible over the field of rational numbers. {Hint: Con¬ 
sider the polynomial 1 + (x + 1) + (x + 1) 2 + • • • + (x + l)*" 1 , and 
use the Eisenstein criterion.) 

4. If m and n are relatively prime integers and if 

| {% + a x x + • • • + a r x r ), 

where the a’s are integers, prove that m | a 0 and n | a r . t- 

5. If a is rational and x — a divides an integer monic polynomial, prove 
that a must be an integer. 

3.11 Polynomial Rings over Commutative Rings 

In defining the polynomial ring in one variable over a field F, no essential 
use was made of the fact that F was a field; all that was used was that F was 
a commutative ring. The field nature of F only made itself felt in proving 
that F[x\ was a Euclidean ring. 

Thus we can imitate what we did with fields for more general rings. 
While some properties may be lost, such as “Euclideanism,” we shall see 
that enough remain to lead us to interesting results. The subject could have 
been developed in this generality from the outset, and we could have 
obtained the particular results about F[x\ by specializing the ring to be a 
field. However, we felt that it would be healthier to go from the concrete 
fo the abstract rather than from the abstract to the concrete. The price we 
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pay for this is repetition, but even that serves a purpose, namely, that of 
consolidating the ideas. Because of the experience gained in treating 
polynomials over fields, we can afford to be a little sketchier in the proofs here. 

Let R be a commutative ring with unit element. By the polynomial ring 
in x over R, /?[#], we shall mean the set of formal symbols a 0 + • • ■ + 

a m x m , where a 0 , a y ,. . ., a m are in R, and where equality, addition, and 
multiplication are defined exactly as they were in Section 3.9. As in that 
section, /?[*] is a commutative ring with unit element. 

We now define the ring of polynomials in the n-variables x y ,. .., x n over R, 
R[x t ,...,x n ], as follows: Let R 1 = R 2 = R\\xf\, the polynomial 

ring in x 2 over R t , . . ., R n = R n _ jJ/J. R n is called the ring of polynomials 
in x y ,. . ., x n over R. Its elements are of the form X a i 1 i 2 ...,vV 1 * 2 12 • • • x„ ln , 
where equality and addition are defined coefficientwise and where multipli¬ 
cation is defined by use of the distributive law and the rule of exponents 
(x l ll x 2 l2 • • • x„ ln )(xf 1 x 2 ’ 2 • • • x „ Jn ) = x l ll+ -’ 1 x 2 l2+ -’ 2 • • • x n ln+ -’ n . Of particular 
importance is the case in which R = F is a field; here we obtain the ring 
of polynomials in n -variables over a field. 

Of interest to us will be the influence of the structure of R on that of 
R[x y , . . ., x„\. The first result in this direction is 

LEMMA 3.11.1 If R is an integral domain, then so is /?[*]. 

Proof. For 0 ^ f {x) = a 0 4- a y x + • • • + a m * m , where a m # 0, in R[x\, 
we define the degree of f (x) to be m ; thus deg f (*) is the index of the highest 
nonzero coefficient of f(x). If R is an integral domain we leave it as an 
exercise to prove that deg (f(x)g(x)) = deg/( jv) + deg^(jv). But then, 
for f (x) 7 ^ 0, g(x) 7 ^ 0, it is impossible to have f(x)g(x) = 0. That is, 
/?[*] is an integral domain. 

Making successive use of the lemma immediately yields the 
COROLLARY If R is an integral domain, then so is R\x y ,. . ., x „]. 

In particular, when F is a field, F[x y , . . . , x n ~] must be an integral domain. 
As such, we can construct its field of quotients; we call this the field of rational 
functions in x y ,... ,x n over F and denote it by F(x t ,..., x n ). This field 
plays a vital role in algebraic geometry. For us it shall be of utmost im¬ 
portance in our discussion, in Chapter 5, of Galois theory. 

Plowever, we want deeper interrelations between the structures of R and 
of R[x y ,. . ., x n ] than that expressed in Lemma 3.11.1. Our development 
now turns in that direction. 

Exactly in the same way as we did for Euclidean rings, we cari speak 
about divisibility, units, etc., in arbitrary integral domains, R, with unit 
element. Two elements a, b in R are said to be associates if a — ub where u 
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is a unit in R. An element a which is not a unit in R will be called irreducible 
(or a prime element) if, whenever a — be with b, c both in R, then one of b or 
c must be a unit in R. An irreducible element is thus an element which 
cannot be factored in a “nontrivial” way. 

DEFINITION An integral domain, R, with unit element is a unique 
factorization domain if 

a. Any nonzero element in R is either a unit or can be written as the product 
of a finite number of irreducible elements of R. 

b. The decomposition in part (a) is unique up to the order and associates 
of the irreducible elements. 

Theorem 3.7.2 asserts that a Euclidean ring is a unique factorization 
domain. The converse, however, is false; for example, the ring F[x u x 2 \, 
where F is a field, is not even a principal ideal ring (hence is certainly not 
Euclidean), but as we shall soon see it is a unique factorization domain. 

In general commutative rings we may speak about the greatest common 
divisors of elements; the main difficulty is that these, in general, might not 
exist. However, in unique factorization domains their existence is assured. 
This fact is not difficult to prove and we leave it as an exercise; equally easy 
are the other parts of 

LEMMA 3.11.2 If R is a unique factorization domain and if a, b are in R, then 
a and b have a greatest common divisor ( a, b) in R. Moreover, if a and b are 
relatively prime (i.e., (a, b) = 1), whenever a | be then a | c. 

COROLLARY If a e R is an irreducible element and a | be, then a | b or a | c. 

We now wish to transfer the appropriate version of the Gauss lemma 
(Theorem 3.10.1), which we proved for polynomials with integer co¬ 
efficients, to the ring /?[*], where R is a unique factorization domain. 

Given the polynomial / (#) = a 0 + a t x + • • • + a m x m in /?[#], then the 
content off (x) is defined to be the greatest common divisor of Oq, <q, . . ., a m . 
It is i/nique within units of R. We shall denote the content of/(*) by c(f). 
A polynomial in /?[*] is said to be primitive if its content is 1 (that is, is a 
unit in R). Given any polynomial / (x) e /?[#], we can write / (x) = affx) 
where a = c{f) and where ffx) e /?[*] is primitive. (Prove!) Except for 
multiplication by units of R this decomposition of f(x), as an element of 
R by a primitive polynomial in /?[#], is unique. (Prove!) 

The proof of Lemma 3.10.1 goes over completely to our present situation; 
the only change that must be made in the proof is to replace the prime 
number p by an irreducible element of R. Thus we have 
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LEMMA 3.11.3 If R is a unique factorization domain, then the product of two 
primitive polynomials in i?[*] is again a primitive polynomial in /?[*]. 

Given /(*), g{x) in i?[*] we can write f (x) = af l {x), g{x) = bgfx), 
where a = c(f), b — c{g) and where f^x) and gfx) are primitive. Thus 
f{x)g(x) — abffx) g t (x). By Lemma 3.11.3, f (x) {x) is primitive. Hence 
the content of/ (x)g(x) is ab, that is, it is c{f)c{g). We have proved the 

COROLLARY If R is a unique factorization domain and if f {x), g(x) are in 
/?[*], then c(fg) = c{f)c{g) {up to units). 

By a simple induction, the corollary extends to the product of a finite 
number of polynomials to read c{f l f 2 • • • / fc ) = c{f l )c{f 2 ) • • • c{f k ). 

Let R be a unique factorization domain. Being an integral domain, by 
Theorem 3.6.1, it has a field of quotients F. We can consider i?[*] to be a 
subring of F[*]. Given any polynomial/ (*) then f {x) = (/>(*)/«), 

where f Q {x) and where a e R. (Prove!) It is natural to ask for the 

relation, in terms of reducibility and irreducibility, of a polynomial in i?[*] 
considered as a polynomial in the larger ring F[*] 

LEMMA 3.11.4 Iff(x) in /?[*] is both primitive and irreducible as an element 
of /?[#], then it is irreducible as an element of F[x]. Conversely, if the primitive 
element f (*) in i?[x] is irreducible as an element of T[Y], it is also irreducible as an 
element of R[x\. 

Proof. Suppose that the primitive element f {x) in i?[*] is irreducible in 
i?[*] but is reducible in F[/|. Thus/ (*) = g{x)h{x), where g(x), h{x) are in 
F[Y] and are of positive degree. Now g(*) = {go{x)/a), h{x) = {h 0 {x)jb), 
where a, b e R and where go(*)j h 0 (x) e Also g 0 (*) = 

h 0 (x) — fihfx), where a = ^(^ 0)3 P = C {K)> anc ^ gi( x )> hfx) are primitive 
in #[*]. Thus f (x) = (a fi/ab)g l {x)h l {x), whence abf (x) — otpg l {x)h 1 {x). 
By Lemma 3.11.3, g x (x)h x {x) is primitive, whence the content of the right- 
hand side is aj9. Since f (x) is primitive, the content of the left-hand side is 
ab; but then ab = a/?; the implication of this is that f {x) = gi(x)h 1 {x), and 
we have obtained a nontrivial factorization of f (x) in /?[*], contrary to 
hypothesis. (Note: this factorization is nontrivial since each of hfx) 

are of the same degree as g(*), h{x), so cannot be units in i?[Y] (see Problem 
4).) We leave the converse half of the lemma as an exercise. 

LEMMA 3.11.5 If R is a unique factorization domain and if p{x) is a primitive 
polynomial in /?[x], then it can be factored in a unique way as the product of irreducible 
elements in /?[*]. 

Proof. When we consider/>(*) as an element in F[x], by Lemma 3.9.5, 
we can factor it as p{x) = pfx) • ■ • p^{x), where pfx), p 2 {x), . . ., p k (x) are 
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irreducible polynomials in F[x\. Each p t {x) = (/(*)/«,■), where /(*) e 
R[x\ and a t eR; moreover, f{x) = cfl^x), where c t = //) and where 
q.(x) is primitive in /?[*]. Thus each ^(*) = (cfl^x) /a.), where a if c t e R 
and where is primitive. Since /> { (*) is irreducible in Z^f*], 

^i(*) must also be irreducible in F[*], hence by Lemma 3.11.4 it is irreducible 
in R\x\. 

Now 

p{x) = pfx) • • -p k (x) = —-— ?/*) • • • ?*(*), 

' ' ' a k 

whence ' ' ' a kP{ x ) = c i c z ' ' ' ‘ ' ' (7/*)- Using the primitivity of 

p(x) and of q 1 (x) • • • q k (x), we can read off the content of the left-hand 
side as a^a 2 • • • a k and that of the right-hand side as c 1 c 2 • • • c k . Thus 
a x a 2 • • • a k = c l c 2 • • -c k , hence p{x) = qfx) • • • g 1 /*). We have factored 
/»(at), in Z?[*], as a product of irreducible elements. 

Can we factor it in another way? If p(x) = jq (x) • • -r/*), where the 
r t (x) are irreducible in Z?[*], by the primitivity of p(x), each r { (x) must be 
primitive, hence irreducible in F[x] by Lemma 3.11.4. But by Lemma 3.9.5 
we know unique factorization in F{x\\ the net result of this is that the 
r t (x) and the q t (x) are equal (up to associates) in some order, hence p(x) 
has a unique factorization as a product of irreducibles in /?[*]. 

We now have all the necessary information to prove the principal theorem 
of this section. 

THEOREM 3.11.1 IfR is a uniquefactorization domain, then so is R[x]. 

Proof. Let f (*) be an arbitrary element in /?[*]. We can write/(*J"in 
a unique way as fix) = cf(x) where c — c(f) is in R and where//*), 
in Z?[*], is primitive. By Lemma 3.11.5 we can decompose fix) in a unique 
way as the product of irreducible elements of R[x\. What about c? 
Suppose that c = a l (x)a 2 (x) • • • a m (x) in R[x \; then 0 = deg c = 
deg ( fl i(*)) + deg (a 2 {x)) + • • • + deg ( a m (x )). Therefore, each a^x) must 
be of degree 0, that is, it must be an element of R. In other words, the 
only factorizations of c as an element of R[x\ are those it had as an element 
oftf. In particular, an irreducible element in R is still irreducible in R[x\. 
Since R is a unique factorization domain, c has a unique factorization as a 
product of irreducible elements of R, hence of /?[*]. 

Putting together the unique factorization of f (x) in the form cf (x) where 
/[ (*) is primitive and where c e R with the unique factorizability of c and 
of/iW we have proved the theorem. 

Given R as a unique factorization domain, then R l = Z?[Aq] is also a 
unique factorization domain. Thus R 2 = Z?/*/ = Z?|>q, * 2 ] is also a 
Unique factorization domain. Continuing in this pattern we obtain 
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COROLLARY 1 If R is a unique factorization domain then so is R\x lr ,. . ., x n ). 

A special case of Corollary 1 but of independent interest and importance is 

COROLLARY 2 If F is a field then F\x i ,.. ., # n ] is a unique factorization 

domain. 

Problems 

1. Prove that i?[*] is a commutative ring with unit element whenever R is. 

2. Prove that R[x u ..., x n ] = . .., * f J, where (f l5 .. ., i n ) is a 

permutation of (1, 2,. .., n). 

3. If R is an integral domain, prove that for f(x),g(x) in /?[*], 
deg (f(x)g(x)) = deg (/(#)) + deg (g(x)). 

4. If R is an integral domain with unit element, prove that any unit in 

must already be a unit in R. 

5. Let R be a commutative ring with no nonzero nilpotent elements (that 
is, a n = 0 implies a = 0). If/(#) = a 0 + a t x + • • • + a m xT in R[x] 
is a zero-divisor, prove that there is an element h # 0 in R such that 
ba 0 = ba 1 = • • • = ba m — 0. 

*6. Do Problem 5 dropping the assumption that R has no nonzero nilpotent 
elements. 

*7. If R is a commutative ring with unit element, prove that a 0 + a t x + 

• • • + a n x" in has an inverse in i?[*] (i.e., is a unit in 7?[>]) if and 
only if a 0 is a unit in R and a lf . . ., a n are nilpotent elements in R. 

8. Prove that when Fis a field, F[x u x 2 ] is not a principal ideal ring. 

9. Prove, completely, Lemma 3.11.2 and its corollary. 

10. (a) If R is a unique factorization domain, prove that every/ (x) e 7?[X] 

can be written as f (x) = affx ), where a e R and where ffx) is 
primitive. 

(b) Prove that the decomposition in part (a) is unique (up to associates). 

11. If R is an integral domain, and if F is its field of quotients, prove that 
any element/(*) in F[*] can be written as f (x) = ( fo(x)/a ), where 
fo(x) e 7?[;v] and where a e R. 

12. Prove the converse part of Lemma 3.11.4. 

13. Prove Corollary 2 to Theorem 3.11.1. 

14. Prove that a principal ideal ring is a unique factorization domain. 

15. If f is the ring of integers, prove that J[x u .. ., x„] is a unique fac¬ 
torization domain. 
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Supplementary Problems 

1. Let R be a commutative ring; an ideal P of R is said to be a prime ideal 
of R if ab e P, a, b e R implies that a e P or b e P. Prove that P is a 
prime ideal of R if and only if R\P is an integral domain. 

2. Let R be a commutative ring with unit element; prove that every 
maximal ideal of R is a prime ideal. 

3. Give an example of a ring in which some prime ideal is not a maximal 
ideal. 

4. If R is a finite commutative ring (i.e., has only a finite number of 
elements) with unit element, prove that every prime ideal of R is a 
maximal ideal of R. 

5. If F is a field, prove that F[x\ is isomorphic to F[*]. 

6. Find all the automorphisms a of F\x\ with the property that °(f) =f 
for every f e F. 

7. If R is a commutative ring, let N = {x e R | *" = 0 for some integer n}. 
Prove 

(a) N is an ideal of R. 

(b) In R = R/N if x m = 0 for some m then x — 0. 

8. Let R be a commutative ring and suppose that A is an ideal of R. 
Let N(A) = {x e R \ x n e A for some n }. Prove 

(a) N{A) is an ideal of R which contains A. 

(b) N(N(A)) = N(A). 

N(A) is often called the radical of A. 

9. If n is an integer, let J n be the ring of integers mod n. Describe N 
(see Problem 7) for J n in terms of n. 

10. If A and B are ideals in a ring R such that A n B = (0), prove that 
for every a e A, b e B, ab = 0. 

11. If R is a ring, let Z(R) = {x e R\xy = yx allj e /?}. Prove that 
Z(R) is a subring of R. 

12. If R is a division ring, prove that Z(R) is a field. 

13. Find a polynomial of degree 3 irreducible over the ring of integers, 
J 3 , mod 3. Use it to construct a field having 27 elements. 

14. Construct a field having 625 elements. 

15. If F is a field and p{x) e F\x\, prove that in the ring 

R = F[x] 

(/>(*))’ 

N (see Problem 7) is (0) if an only if p{x) is not divisible by the square of 
any polynomial. 
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16. Prove that the polynomial/ (x) = 1 + * + a? 3 + x 4 is not irreducible 
over any field F. 

17. Prove that the polynomial f (x) = x 4 + 2x + 2 is irreducible over 
the field of rational numbers. 

18. Prove that if F is a finite field, its characteristic must be a prime number 
p and F contains p n elements for some integer. Prove further that if 
a e F then a?" = a. 

19. Prove that any nonzero ideal in the Gaussian integers J\i\ must contain 
some positive integer. 

20. Prove that if R is a ring in which a 4 = a for every a e R then R must 
be commutative. 

21. Let R and R' be rings and (f) a mapping from R into R' satisfying 

(a) (f>(x + y) = (f)(x) + <f>{y) for every x,y e R. 

(b) (f>(xy) = 0(x)0(j) or (f>{y)(f>{x). 

Prove that for all a, b e R, <f)(ab ) = (f>{a)^{b) or that, for all a, b e R, 
(f>{a) = (j){b)(f)[a). {Hint: If a e R, let 

W a = {a? e R | (f){ax) = (f){a)(f){x )} 

and 

u a = i x e R | 0(a#) = 0(*)0(«)}.) 

22. Let R be a ring with a unit element, 1, in which ( ab ) 2 = a 2 b 2 for 
all a, b e R. Prove that R must be commutative. 

23. Give an example of a noncommutative ring (of course, without 1) in 
which {ab) 2 = a 2 b 2 for all elements a and b. 

24. (a) Let R be a ring with unit element 1 such that {ab) 2 = {ba) 2 for 

all a,b e R. If in R, 2x = 0 implies x = 0, prove that R must be 
commutative. 

(b) Show that the result of (a) may be false if 2x = 0 for some x 0 
in R. 

(c) Even if 2x = 0 implies x = 0 in R, show that the result of (a) 
may be false if R does not have a unit element. 

25. Let R be a ring in which = 0 implies x = 0. If {ab) 2 = a 2 b 2 

for all a,b e R, prove that R is commutative. 

26. Let R be a ring in which *" = 0 implies x = 0. If {ab) 2 = {ba) 2 

for all a, b e R, prove that R must be commutative. 

27. Let p u p 2 , • • •, p k be distinct primes, and let n = p x p 2 • • • p k . If R is 

the ring of integers modulo n , show that there are exactly 2* elements 
a in R such that a 2 = a. 

28. Construct a polynomial q{x) ^ 0 with integer coefficients which has 
no rational roots but is such that for any prime p we can solve the 
congruence q{x) = 0 mod p in the integers. 
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Vector Spaces and Modules 


Up to this point we have been introduced to groups and to rings; the 
former has its motivation in the set of one-to-one mappings of a set 
onto itself, the latter, in the set of integers. The third algebraic model 
which we are about to consider—vector space—can, in large part, 
trace its origins to topics in geometry and physics. 

Its description will be reminiscent of those of groups and rings—in 
fact, part of its structure is that of an abelian group—but a vector 
space differs from these previous two structures in that one of the 
products defined on it uses elements outside of the set itself. These 
remarks will become clear when we make the definition of a vector 
space. 

Vector spaces owe their importance to the fact that so many models 
arising in the solutions of specific problems turn out to be vector 
spaces. For this reason the basic concepts introduced in them have a 
certain universality and are ones we encounter, and keep encountering, 
in so many diverse contexts. Among these fundamental notions are 
those of linear dependence, basis, and dimension which will be de¬ 
veloped in this chapter. These are potent and effective tools in all 
branches of mathematics; we shall make immediate and free use of 
these in many key places in Chapter 5 which treats the theory of fields. 

Intimately intertwined with vector spaces are the homomorphisms 
of one vector space into another (or into itself). These will make up 
the bulk of the subject matter to be considered in Chapter 6. 

In the last part of the present chapter we generalize from vector spaces 
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Ito modules; roughly speaking, a module is a vector space over a ring instead 
of over a field. For finitely generated modules over Euclidean rings we 
tshall prove the fundamental basis theorem. This result allows us to give a 
|complete description and construction of all abelian groups which are 
nerated by a finite number of elements. 


4.1 Elementary Basic Concepts 

DEFINITION A nonempty set V is said to be a vector space over a field F 
|lf V is an abelian group under an operation which we denote by + , and 
if for every a e F, v e V there is defined an element, written otv, in V subject 
I to 

1. ot(v + w) = ctv + aw; 

2. (a + fi)v = otv + (iv; 

; 3. aQ8a) = (ot[i)v; 

4. la = v; 

for all a, e F, v, w e V (where the 1 represents the unit element of F 
under multiplication). 


Note that in Axiom 1 above the + is that of V , whereas on the left-hand 
side of Axiom 2 it is that of F and on the right-hand side, that of V. 

We shall consistently use the following notations: 

a. F will be a field. 

b. Lowercase Greek letters will be elements of F; we shall often refer to 

elements of F as scalars. T ' 

c. Capital Latin letters will denote vector spaces over F. 

d. Lowercase Latin letters will denote elements of vector spaces. We shall 
often call elements of a vector space vectors. 


If we ignore the fact that V has two operations defined on it and view it 
for a moment merely as an abelian group under +, Axiom 1 states nothing 
? more than the fact that multiplication of the elements of V by a fixed scalar 
k a defines a homomorphism of the abelian group V into itself. From Lemma 
4.1.1 which is to follow, if a ^ 0 this homomorphism can be shown to be 
f an isomorphism of V onto V. 

This suggests that many aspects of the theory of vector spaces (and of 
nngs, too) could have been developed as a part of the theory of groups, 
had we generalized the notion of a group to that of a group with operators. 
For students already familiar with a little abstract algebra, this is the pre¬ 
ferred point of view ; since we assumed no familiarity on the reader’s part 
with any abstract algebra, we felt that such an approach might lead to a 
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too sudden introduction to the ideas of the subject with no experience to 
act as a guide. 

Example 4.1.1 Let F be a field and let K be a field which contains F as 

a subfield. We consider K as a vector space over F, using as the + of the 

vector space the addition of elements of K, and by defining, for a e F, 
v g K, ocv to be the products of a and v as elements in the field K. Axioms 
1 , 2 , 3 for a vector space are then consequences of the right-distributive 
law, left-distributive law, and associative law, respectively, which hold for 
K as a ring. 

Example 4.1.2 Let F be a field and let V be the totality of all ordered 
n-tuples, (a l5 . .., a„) where the oq e F. Two elements (oq,. . ., a„) and 
(j8 15 . . . , fi„) of V are declared to be equal if and only if a f = fi t for each 
i — 1, 2,..., n. We now introduce the requisite operations in V to make 
of it a vector space by defining: 

1. (oq,... 5 oc„) + (fii, • • • 5 fin) = (oq 4- fii, cc 2 + fi 2 ,. . ., cc n 4- fi„). 

2. y(oq, . • •, a„) = (yoq,. .., yoq) for y e F. 

It is easy to verify that with these operations, V is a vector space over F. 
Since it will keep reappearing, we assign a symbol to it, namely F {n) . 

Example 4.1.3 Let F be any field and let V = F[x\, the set of poly¬ 
nomials in * over F. We choose to ignore, at present, the fact that in F[x\ 
we can multiply any two elements, and merely concentrate on the fact that 
two polynomials can be added and that a polynomial can always be multi¬ 
plied by an element of F. With these natural operations F\x\ is a vector 
space over F. 

Example 4.1.4 In F\x\ let V n be the set of all polynomials of degree less 
than n. Using the natural operations for polynomials of addition and 
multiplication, V n is a vector space over F. 

What is the relation of Example 4.1.4 to Example 4.1.2? Any element of 
V n is of the form a 0 + oq* + •• • + a II _ 1 x" -1 , where a i eF; if we map 
this element onto the element (a 0 , oq,. . ., a„_ 1 ) in F we could reasonably 
expect, once homomorphism and isomorphism have been defined, to find 
that V n and F (n) are isomorphic as vector spaces. 

DEFINITION If V is a vector space over F and if W c V, then IT is a 
subspace of V if under the operations of V, W, itself, forms a vector space 
over F. Equivalently, IT is a subspace of V whenever ztq, w 2 £ 
a, fi g F implies that aw 1 + fiw 2 e IT. 
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Note that the vector space defined in Example 4.1.4 is a subspace of that 

defined in Example 4.1.3. Additional examples of vector spaces and 
f subspaces can be found in the problems at the end of this section. 

DEFINITION If U and V are vector spaces over F then the mapping T 
. of U into V is said to be a homomorphism if 

S 1. K + u 2 )T = u x T + u 2 T; 

2. (a u x )T = a(« 1 T ); 

for all u u u 2 e U, and all a e F. 

As in our previous models, a homomorphism is a mapping preserving 
f: all the algebraic structure of our system. 

If T, in addition, is one-to-one, we call it an isomorphism. The kernel of 
. T * s defined as {u e U\ uT = 0} where 0 is the identity element of the 
addition in V. It is an exercise that the kernel of T is a subspace of U and 
that T is an isomorphism if and only if its kernel is (0). Two vector spaces 
are said to be isomorphic if there is an isomorphism of one onto the other. 

The set of all homomorphisms of U into V will be written as Horn (£/, V). 
Of particular interest to us will be two special cases, Horn (U, F) and 
Horn (U,U). We shall study the first of these soon; the second, which can be 
Shown to be a ring, is called the ring of linear transformations on U. A great 
deal of our time, later in this book, will be occupied with a detailed studv 
of Horn (U, U). 

We begin the material proper with an operational lemma which, as in 
the case of rings, will allow us to carry out certain natural and simple 
^computations in vector spaces. In the statement of the lemma, 0 represents 
|the zero of the addition in V, o that of the addition in F, and -v the 
-additive inverse of the element v of V. 


LEMMA 4.1.1 


If V is a vector space over F then 


a0 = 0 for a e F. 
ov =-0 for v e V. 

(— a)v = — (ay) for a e F, v e V. 

If » # 0, then <xv = 0 implies that a = o. 

Proof. The proof is very easy and follows the lines of the analogous 
suits proved for rings; for this reason we give it briefly and with few 
K planations. 
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3. Since 0 = (a + (-a))» = ctv + {-a)v, (~a)v = -(av). 

4. If av = 0 and a # o then 

0 = a -1 0 = a _1 (ay) = (a -1 a)z> = lz> = v. 

The lemma just proved shows that multiplication by the zero of V or of 
F always leads us to the zero of V. Thus there will be no danger of confusion 
in using the same symbol for both of these, and we henceforth will merely 
use the symbol 0 to represent both of them. 

Let Tbe a vector space over F and let IT be a subspace of V. Considering 
these merely as abelian groups construct the quotient group V/W; its 
elements are the cosets v + W where v e V. The commutativity of the 
addition, from what we have developed in Chapter 2 on group theory, 
assures us that V/W is an abelian group. We intend to make of it a vector 
space. If a e F, v + W e VjW, define a(v + W) = av + W. As is usual, 
we must first show that this product is well defined; that is, if v + W = 
v' + W then a(v + W) = a(v' + W). Now, because v + W = v’ + W, 
v — v' is in W; since IT is a subspace, a(v — v ') must also be in IT. Using 
part 3 of Lemma 4.1.1 (see Problem 1) this says that av — av' e W and so 
av + IT = av' + IT. Thus a(v + IT) = av + IT = av' + IT = a(v' + IT); 
the product has been shown to be well defined. The verification of the 
vector-space axioms for V/W is routine and we leave it as an exercise. 
We have shown 

LEMMA 4.1.2 If V is a vector space over F and if W is a subspace of V, then 
VjW is a vector space over F, where, for v t + W, v 2 + We VjW and a e F, 

1. (v t + W) + ( v 2 + W) = (v t + v 2 ) + W. 

2. a(zq + IT) = av x + IT. 

VIW is called the quotient space of V by IT. 

Without further ado we now state the first homomorphism theorem for 
vector spaces; we give no proofs but refer the reader back to the proof of 
Theorem 2.7.1. 

THEOREM 4.1.1 If T is a homomorphism of U onto V with kernel IT, then V 
is isomorphic to UjW. Conversely, if U is a vector space and W a subspace of U, 
then there is a homomorphism of U onto UjW. 

The other homomorphism theorems will be found as exercises at the end 
of this section. 

DEFINITION Let V be a vector space over F and let U x ,..., U n be 
subspaces of V. V is said to be the internal direct sum of U x , . . ., U n if every 
element v e V can be written in one and only one way as v = u x + u 2 + 

■ • ■ + u n where u t e U t . 
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Given any finite number of vector spaces over F, V t , ..., V n , consider 
the set V of all ordered n-tuples (v u .. ., v n ) where v t e V t . We declare two 
elements (» l5 . . ., v n ) and (v[, of f to be equal if and only if for 

each i, v i — v i . We add two such elements by defining , v ) + 

(wi, • • •, w n ) to be (v l + w 1 , v 2 + w 2 ,. . ., v n + w n ). Finally, if a g F 
:and (v u ... ,v n ) e V we define a(» 1} . . ., i>„) to be (a» l9 av 2 ,. .., av n ). 
To check that the axioms for a vector space hold for V with its operations 
as defined above is straightforward. Thus V itself is a vector space over F. 
We call V the external direct sum of V 1 , ..., V and denote it by writing 
V = Fj ® • • • ® V n . 


THEOREM 4.1.2 If V is the internal direct sum of U u . .., U n , then V is 
isomorphic to the external direct sum of U 1 ,.. ., U . 

Proof. Given v g V, v can be written, by assumption, in one and only 
one way as v = u y + u 2 + ■ • • + u n where u t g U t ; define the mapping 
T of F into U, © • • • © U„ by vT = (iq,. .., u „). Since » has a unique 
representation of this form, T is well defined. It clearly is onto, for the 
arbitrary element (ztq, . . ., w n ) G U l ® • • -@ U n is wT where w = w 1 + 

* * • + w n G V. We leave the proof of the fact that T is one-to-one and a 
homomorphism to the reader. 

Because of the isomorphism proved in Theorem 4.1.2 we shall henceforth 
merely refer to a direct sum, not qualifying that it be internal or external. 


Problems 

1. In a vector space show that a (v — w) = av — aw. 

2. Prove that the vector spaces in Example 4.1.4 and Example 4.1.2 are 
isomorphic. 

3. Prove that the kernel of a homomorphism is a subspace. 

4. (a) If F is a field of real numbers show that the set of real-valued, 

continuous functions on the closed interval [0, 1] forms a vector 
space over F. 

(6) Show that those functions in part (a) for which all nth derivatives 
exist for n = 1,2,... form a subspace. 

5. (a) Let F be the field of all real numbers and let V be the set of all 

sequences {a u a 2 , . . ., a n , . . .), a t g F, where equality, addition 
and scalar multiplication are defined componentwise. Prove that 
V is a vector space over F. 

(b) Let W = {(«!, g V | lim a n = 0}. Prove that W 

. n~> od 

is a subspace of V. 
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*(c) Let U = {(a l5 .. ., a n , . . .) e V | ^ a £ 2 is finite}. Prove that U is 

i = 1 

a subspace of V and is contained in W. 

6 . If U and V are vector spaces over F, define an addition and a multipli¬ 
cation by scalars in Horn (U, V) so as to make Horn (U, V) into a 
vector space over F. 

*7. Using the result of Problem 6 prove that Horn (F^"\ F (m) ) is isomorphic 
to F nm as a vector space. 

8 . If n > m prove that there is a homomorphism of F onto F (m) with 
a kernel W which is isomorphic to 

9. If v ^ 0 e F prove that there is an element T e Horn (F^ n \ F ) 
such that vT ^ 0. 

10. Prove that there exists an isomorphism of F into 
Horn (Horn (F<">, F), F). 

11. If U and W are subspaces of V, prove that U+W={veV\v — 
u + w, u e U, we W} is a subspace of V. 

12. Prove that the intersection of two subspaces of V is a subspace of V. 

13. If A and B are subspaces of V prove that (A + B)jB is isomorphic to 
Aj(A nB). 

14. If T is a homomorphism of U onto V with kernel W prove that there 
is a one-to-one correspondence between the subspaces of V and the 
subspaces of U which contain W. 

15. Let V be a vector space over F and let V t ,, V n be subspaces of 
V. Suppose that V = V l + V 2 +•••-}- V n (see Problem 11), and 
that Vf n (V t + ••• + V i _ 1 + V i + 1 + ••• + V n ) = (0) for every 
i — 1,2Prove that V is the internal direct sum of V u . . ., V n . 

16. Let V = V x © • • • © V n ; prove that in V there are subspaces V t 
isomorphic to V { such that V is the internal direct sum of the F). 

17. Let T be defined on F (2) by (* l5 x 2 )T = (a x x + j Sx 2 , 7^1 + Sx 2 ) 
where a, j?, y, d are some fixed elements in F. 

(a) Prove that T is a homomorphism of i 7 ' (2) into itself. 

(b) Find necessary and sufficient conditions on a, j?, y, S so that T is 
an isomorphism. 

18. Let T be defined on F (3) by (* l5 x 2 , x 3 ) T = {cinX^ + a 12 *2 + 
ai 3 x 3 , a 2 i*i + a 22 x 2 + a 23 x 3 , a 31 x 1 + a 32 x 2 + a 33 x 3 ). Show that T 
is a homomorphism of F (3) into itself and determine necessary and 
sufficient conditions on the a y so that T is an isomorphism. 
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19. Let T be a homomorphism of Finto W. Using T, define a homomor¬ 
phism T* of Horn (W, F) into Horn (V, F). 

20. (a) Prove that F (1) is not isomorphic to F (n) for n > 1. 

(b) Prove that T (2) is not isomorphic to T (3) . 

21. If V is a vector space over an infinite field F, prove that V cannot be 
written as the set-theoretic union of a finite number of proper subspaces. 

4.2 Linear Independence and Bases 

If we look somewhat more closely at two of the examples described in the 
previous section, namely Example 4.1.4 and Example 4.1.3, we notice that 
although they do have many properties in common there is one striking 
difference between them. This difference lies in the fact that in the former 
we can find a finite number of elements, 1, *, * 2 ,..., * n ~ 1 such that every 
element can be written as a combination of these with coefficients from F, 
whereas in the latter no such finite set of elements exists. 

We now intend to examine, in some detail, vector spaces which can be 
generated, as was the space in Example 4.1.4, by a finite set of elements. 

DEFINITION If V is a vector space over F and if v lf ..,, v„ e V then 
any element of the form oc^ + a 2 v 2 + ■ • • + where the a, e F, is a 
linear combination over F oft/,,... v 

. Since we dually are working with some fixed field F we shall often say 
linear combination rather than linear combination over F. Similarly it will 
be understood that when we say vector space we mean vector space over F. 

DEFINITION If S is a nonempty subset of the vector space V, then L(S), 
the linear span of S, is the set of all linear combinations of finite sets of 
elements of S. 


We put, after all, into L{S) the elements required by the axioms of a 
Vector space, so it is not surprising to find 

LEMMA 4.2.1 L(S) is a subspace of V. 

Proof. If o and w are in L(S), then v = Vi + • * * + V„ and w = 
/Mi + • • • + n m t m , where the A’s and fs are in F and the s t and t t are all 
ui S. Thus, for a, 0eF, ocv + pw = aCVi + • • • + V.) + /(/Mi + 

. ' +'PJm) ~ (aV-h + • • • + (a 2.„)s n + + ■ * * + (pH m )t m and so 

is again m L(S). L(S) has been shown to be a subspace of V. 

The proof of each part of the next lemma is straightforward and easy 
and we leave the proofs as exercises to the reader. 
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LEMMA 4.2.2 If S, T are subsets of V, then 

1 . S a T implies L(S) a L(T). 

2. L(S u T) = L(S) + L{T). 

3. L(L(S)) = L(S). 

DEFINITION The vector space V is said to be finite-dimensional (over F) 
if there is a finite subset S in V such that V = L(S). 

Note that F {n) is finite-dimensional over F, for if S consists of the n vectors 
(1, 0, ..., 0), (0, 1, 0,. .., 0),.. ., (0, 0 ,. .., 0, 1), then V = L(S). 

Although we have defined what is meant by a finite-dimensional space 
we have not, as yet, defined what is meant by the dimension of a space. 
This will come shortly. 

DEFINITION If V is a vector space and if v 1} .. ., v n are in V, we say that 
they are linearly dependent over F if there exist elements X x , . . ., X„ in F, 
not all of them 0, such that X 1 v 1 + X 2 v 2 + ‘ ‘ ’ + X n v n = 0. 

If the vectors v x ,.. ., v n are not linearly dependent over F, they are said 
to be linearly independent over F. Here too we shall often contract the phrase 
“linearly dependent over F ” to “linearly dependent.” Note that if v x ,..., 
v n are linearly independent then none of them can be 0, for if v x = 0, 
say, then olv x + 0v 2 + • • • + 0v n = 0 for any a # Oin F. 

In F (3) it is easy to verify that (1, 0, 0), (0, 1,0), and (0, 0, 1) are linearly 
independent while (1, 1,0), (3, 1, 3), and (5, 3, 3) are linearly dependent. 

We point out that linear dependence is a function not only of the vectors 
but also of the field. For instance, the field of complex numbers is a vector 
space over the field of real numbers and it is also a vector space over the 
field of complex numbers. The elements v x = 1, v 2 = i in it are linearly 
independent over the reals but are linearly dependent over the complexes, 
since iv x + ( — 1 )»2 = 0. 

The concept of linear dependence is an absolutely basic and ultra- 
important one. We now look at some of its properties. 

LEMMA 4.2.3 If v u .. ., v n e V are linearly independent, then every element in 
their linear span has a unique representation in the form ?. 1 v 1 + • • * + I ri v n with 
the e F. 

Proof. By definition, every element in the linear span is of the form 
X x v x + • • • + X n v n . To show uniqueness we must demonstrate that if 
>Vi + ' ’ • + K v n # * +H n v n then X x = ju l5 X 2 = n 2 , . .., X n - 

But if X 1 v 1 + • • • + X„v„ = Hi v i + ’ " + VrPm then we certainly have 
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. ^ + ( A 2 - l* 2 )o 2 + • • • + (K - H„)v n = 0, which by the linear 

independence of v u . . . , v n forces A, - p, = 0 , X 7 - «, = 0 
| - /!„ = 0 . ’ 

$ The next theorem, although very easy and at first glance of a somewhat 
H technical nature, has as consequences results which form the very foundations 

m S . U ^ CCt ’ ^ st some °f these as corollaries; the others will 

I appear in the succession of lemmas and theorems that are to follow. 

’§ THEOREM 4.2.1 If V u . . ., V n are in V then either they are linearly independ¬ 
ent or some v k is a linear combination of the preceding ones , v x , . . . , v k j. 

; Pr00f ' If °i> ■ • • > v n are linearly independent there is, of course, nothing 
I P rove - Suppose then that oqzq + • • • + oc n v n = 0 where not all the 
a ’ s are Let k be the largest integer for which a k ^ 0 . Since a ; = 0 

for i > £, oc 1 v 1 + • • • + cc k v k = 0 which, since a k ^ 0 , implies that 
v k 1 ( a i z; i ~ « 2®2 a k-i v k-i) = ( —+••• + 

I \ a k <x k-i) v k-i- Thus v k is a linear combination of its predecessors. 

f COROLLARY 1 If v x ,... ,v n in V have W as linear span and if v 1} . . . , v k 
I are Nearly independent , then we can find a subset of v x , . . ., v n of the form v u 

v V 2 , • • • 3 v k , v h , ..., v ir consisting of linearly independent elements whose linear 
f| span is also W. 

Proof. If are linearly independent we are done. If not, weed 

| °y t f rom this set the first v p which is a linear combination of its predecessors. 
Since v x , . .., v k are linearly independent,./ > k. The subset so constructed, 
v i> • • • 3 v k> • • • 3 Vj -13 Vj + 1 , • • •, v n has n — 1 elements. Clearly its linear 
| span is contained in W. However, we claim that it is actually equal to W; 

I for, given w e IV, w can be written as a linear combination of v x , ... , v n [ 
But in this linear combination we can replace Vj by a linear combination of 
|; v i> ■ • • ’ v j- 1 - That is, w is a linear combination of v u . . ., v j _ 1 , v- +1 , ,v n . 

\ Continuing this weeding out process, we reach a subset v t ,.. ., v k , 
Vi o ' • • ’ v i r w Lose linear span is still W but in which no element is a linear 
combination of the preceding ones. By Theorem 4.2.1 the elements 
• • • ’ v k-> v i t > • • ■ 3 v i r must be linearly independent. 

COROLLARY 2 If V is a finite-dimensional vector space, then it contains a 
finite set v t ,..., v n of linearly independent elements whose linear span is V. 

Proof. Since V is finite-dimensional, it is the linear span of a finite 
number of elements u u ... ,u m . By Corollary 1 we can find a subset of 
lese, denoted by v x ,. . ., v n , consisting of linearly independent elements 
Whose linear span must also be V. 
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DEFINITION A subset S of a vector space V is called a basis of V if S 
consists of linearly independent elements (that is, any finite number of 
elements in S is linearly independent) and V = L(S). 

In this terminology we can rephrase Corollary 2 as 

COROLLARY 3 If V is a finite-dimensional vector space and if u t ,... ,u m 
span V then some subset of u u . . . , u m forms a basis of V. 

Corollary 3 asserts that a finite-dimensional vector space has a basis 
containing a finite number of elements v u ... ,v n . Together with Lemma 
4.2.3 this tells us that every element in Thas a unique representation in the 
form oqzq + ■ • • + ct n v n with a 1? . . ., a„ in F. 

Let us see some of the heuristic implications of these remarks. Suppose 
that V is a finite-dimensional vector space over F\ as we have seen above, 

V has a basis v u ... ,v„. Thus every element ve V has a unique repre¬ 
sentation in the form v = <x 1 v 1 + • • • + cc n v n . Let us map V into F^ n) by 
defining the image of ct 1 v 1 + • • • + cc„v n to be (oq,. . . , a„). By the unique¬ 
ness of representation in this form, the mapping is well defined, one-to-one, 
and onto; it can be shown to have all the requisite properties of an iso¬ 
morphism. Thus V is isomorphic to F (n) for some n, where in fact n is 
the number of elements in some basis of V over F. If some other basis of 

V should have m elements, by the same token V would be isomorphic to 
F (m) . Since both F (n) and F (m) would now be isomorphic to V, they would 
be isomorphic to each other. 

A natural question then arises! Under what conditions on n and m are 
F (n) and F (m) isomorphic? Our intuition suggests that this can only happen 
when n = m. Why? For one thing, if F should be a field with a finite 
number of elements—for instance, if F — J p the integers modulo the prime 
number p —then F^ n) has p n elements whereas F (m) has p m elements. Iso¬ 
morphism would imply that they have the same number of elements, and 
so we would have n = m. From another point of view, if F were the field 
of real numbers, then F (n) (in what may be a rather vague geometric way 
to the reader) represents real n-space, and our geometric feeling tells us 
that n-space is different from m-space for n # m. Thus we might expect 
that if F is any field then F (n) is isomorphic to F (m) only if n — m. Equiv¬ 
alently, from our earlier discussion, we should expect that any two bases of 

V have the same number of elements. It is towards this goal that we prove 
the next lemma. 

LEMMA 4.2.4 If v u ..., v n is a basis of V over F and if w u ..., w m in V 
are linearly independent over F, then m < n. 

Proof. Every vector in V, so in particular w m , is a linear combination 
of v 1} . . ., v n . Therefore the vectors w m , v u . . ., v n are linearly dependent. 
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Moreover, they span V since v u ...,v n already do so. Thus some proper 
subset of these w m , v h , . . ., v ik with k < n — 1 forms a basis of V. We 
have “traded off” one w, in forming this new basis, for at least one v t . 
Repeat this procedure with the set w m _ u w m , v h , . . ., v ik . From this 
linearly dependent set, by Corollary 1 to Theorem 4.2.1, we can extract a 
basis of the form w m _ l , w m , v ji} . . ., v js , s < n — 2. Keeping up this 
procedure we eventually get down to a basis of V of the form w 2 , , 

W m-1> w m> v a’ v p • • • '> since w i is not a linear combination of w 2 ,. . ., w m _ l , the 
above basis must actually include some v. To get to this basis we have 
introduced m — 1 w’ s, each such introduction having cost us at least one v, 
and yet there is a v left. Thus m — 1 < n — 1 and so m < n. 

This lemma has as consequences (which we list as corollaries) the basic 
results spelling out the nature of the dimension of a vector space. These 
corollaries are of the utmost importance in all that follows, not only in this 
chapter but in the rest of the book, in fact in all of mathematics. The 
corollaries are all theorems in their own rights. 

COROLLARY 1 If V is finite-dimensional over F then any two bases of V 
have the same number of elements. 

Proof. Let v t ,... ,v n be one basis of V over F and let w u ... ,w m be 
another. In particular, w 1} . . ., w m are linearly independent over F whence, 
by Lemma 4.2.4, m < n. Now interchange the roles of the v’s and w’s and 
we obtain that n < m. Together these say that n = m. 

COROLLARY 2 F (n) is isomorphic F (m) if and only if n = m. 

Proof. F (n) has, as one basis, the set of n vectors, (1, 0, . . ., 0), (0,1, 
0, . . ., 0), . .., (0, 0, . . ., 0, 1). Likewise has a basis containing m 

vectors. An isomorphism maps a basis onto a basis (Problem 4, end of this 
section), hence, by Corollary 1, m = n. 

Corollary 2 puts on a firm footing the heuristic remarks made earlier 
about the possible isomorphism of F (n) and F (m) . As we saw in those re¬ 
marks, V is isomorphic to for some n. By Corollary 2, this n is unique, thus 

COROLLARY 3 If V is finite-dimensional over F then V is isomorphic to F^ 
for a unique integer n; in fact, n is the number of elements in any basis of V over F. 

DEFINITION The integer n in Corollary 3 is called the dimension of V 
over F. 

The dimension of V over F is thus the number of elements in any basis 
of V over F. 
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We shall write the dimension of V over F as dim V, or, the occasional 
time in which we shall want to stress the role of the field F, as dim F V. 

COROLLARY 4 Any two finite-dimensional vector spaces over F of tlw same 
dimension are isomorphic. 

Proof. If this dimension is n, then each is isomorphic to F {n) , hence 
they are isomorphic to each other. 

How much freedom do we have in constructing bases of V? The next 
lemma asserts that starting with any linearly independent set of vectors 
we can “blow it up” to a basis of V. 

LEMMA 4.2.5 If V is finite-dimensional over F and if u x ,. . ., u m e V are 
linearly independent , then we can find vectors u m+x ,..., u m+r in V such that 
Mi,..., M m , u m+l , ..., u m+r is a basis of V. 

Proof. Since V is finite-dimensional it has a basis; let v x ,... ,v n be a 
basis of V. Since these span V, the vectors u x ,.. ., u m , v x , . . ., v„ also span 
V. By Corollary 1 to Theorem 4.2.1 there is a subset of these of the form 
M m , v h ,.. ., v ir which consists of linearly independent elements 
which span V. To prove the lemma merely put M m+1 = v h ,. .., u m+r = 

V ir 

What is the relation of the dimension of a homomorphic image of V to 
that of V? The answer is provided us by 

LEMMA 4.2.6 If V is finite-dimensional and if W is a subspace of V, then W 
is finite-dimensional, dim W < dim V and dim VjW = dim V — dim W. 

Proof. By Lemma 4.2.4, if n = dim V then any n + 1 elements in V 
are linearly dependent; in particular, any n + 1 elements in W are linearly 
dependent. Thus we can find a largest set of linearly independent elements 
in W, w x ,. . ., w m and m < n. If we W then w x ,..., w m , w is a linearly 
dependent set, whence aw + a x w x + • • • + a m w m = 0, and not all of the 
otj’s are 0. If a = 0, by the linear independence of the w { we would get that 
each ocj = 0, a contradiction. Thus a ^ 0, and so w = —a 1 (a l w l + 

• • • + a m w m ). Consequently, w x ,.. ., w m span W; by this, W is finite¬ 
dimensional over F, and furthermore, it has a basis of m elements, where 
m < n. From the definition of dimension it then follows that dim W < 
dim V. 

Now, let w x ,. . ., w m be a basis of W. By Lemma 4.2.5, we can fill this 
out to a basis, w x ,.. ., w m , v x ,. .., v r of V, where m + r = dim V and 
m — dim W. 

Let v x ,... ,v r be the images, in V = VjW, of v x ,..., v r . Since any 
vector v e V is of the form v = a x w x + • • • + a m w m + fi 1 v 1 + • • • + ft r v r , 
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then v, the image of v, is of the form v = p l v l +•••+£.», (since w t = 

w 2 = • • • = w m = 0). Thus span F/fP. We claim that they are 

linearly independent, for if + • • • + y r v r = 0 then y + • • • + 

y r v r e W, and so y t v t + • • • + y r v r = X 1 w 1 + • • • + X m w m , which, by the 
linear independence of the set w u ..., w m , v u .. ., v r forces y 1 = ■ • • = 
Vr = A i = ■ • • = = 0. We have shown that V/JY has a basis of r 
elements, and so, dim VjW = r = dim V — m — dim V - dim W. 

COROLLARY If A and B are finite-dimensional subspaces of a vector space V, 
then A + B is finite-dimensional and dim {A + B) = dim (A) + dim (B) — 
dim (A n B). 

Proof. By the result of Problem 13 at the end of Section 4.1, 

A + B ~ A 
B ~ A n B’ 

and since A and B are finite-dimensional, we get that 

dim (A + B) — dim B = dim ( -- + ^ = dim { —-_^ 

\ B ) [AnBj 

= dim A - dim (A n B ). 

Transposing yields the result stated in the lemma. 

Problems 

1. Prove Lemma 4.2.2. 

2. (a) IfFis the field of real numbers, prove that the vectors (1, 1, 0, 0), 

(0, 1, —1, 0), and (0, 0, 0, 3) in F (4) are linearly independent 
over F. 

(b) What conditions on the characteristic of F would make the three 
vectors in (a) linearly dependent? 

3. If V has a basis of n elements, give a detailed proof that Pis isomorphic 
to F ( ">. 

4r If T is an isomorphism of V onto W, prove that T maps a basis of V 
onto a basis of W. 

5. If V is finite-dimensional and T is an isomorphism of V into V, prove 
that T must map V onto V. 

6. If V is finite-dimensional and T is a homomorphism of V onto V, 
prove that T must be one-to-one, and so an isomorphism. 

^ ^ dimension n, show that any set of n linearly independent 
vectors in V forms a basis of V. 
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8. If V is finite-dimensional and IV is a subspace of V such that dim V = 
dim IP, prove that V = IP. 

9. If V is finite-dimensional and T is a homomorphism of V into itself 
which is not onto, prove that there is some v ^ 0 in V such that 
vT = 0. 

10. Let F be a field and let F[*] be the polynomials in x over F. Prove 
that F[x] is not finite-dimensional over F. 

11. Let V n = (p(x) ef[*] | deg p(x) < n). Define T by 
(a 0 + ol^x + • • • + a,,-!*" - *) T 

= a 0 + 0 L x {x + 1) + 0t 2 ( x + l) 2 + ••• + + l)" -1 . 

Prove that T is an isomorphism of V n onto itself. 

12. Let W = {a 0 + <x x x + • • • + cc„_ l V'~ 1 e F[*] | a 0 + ct x + • • • + 

= 0}. Show that IP is a subspace of V n and find a basis of W 
over F. 

13. Let v x ,. . ., v„ be a basis of V and let w x ,... ,w n be any n elements 
in V. Define T on V by (X x v x + • • • + X n v n ) T = X x w x + • • • + X n w n . 

(a) Show that R is a homomorphism of V into itself. 

(b) When is T an isomorphism? 

14. Show that any homomorphism of V into itself, when V is finite¬ 
dimensional, can be realized as in Problem 13 by choosing appropriate 
elements w x ,..., w n . 

15. Returning to Problem 13, since v x ,... ,v n is a basis of V, each 
w i = ct il v 1 + • • • + oc in v n , oiij e F. Show that the n 2 elements ay of 
F determine the homomorphism T. 

*16. If dim F V = n prove that dim F (Horn (V,V)) = n 2 . 

17. If V is finite-dimensional and IP is a subspace of V prove that there 
is a subspace W 1 of V such that V = W ® IP 1 . 

4.3 Dual Spaces 

Given any two vector spaces, V and IP, over a field F, we have defined 
Horn ( V, IP) to be the set of all vector space homomorphisms of V into IP. 
As yet Horn (P, IP) is merely a set with no structure imposed on it. We 
shall now proceed to introduce operations in it which will turn it into a 
vector space over F. Actually we have already indicated how to do so in 
the descriptions of some of the problems in the earlier sections. However 
we propose to treat the matter more formally here. 

Let S and T be any two elements of Horn ( V, IP) ; this means that these 
are both vector space homomorphisms of V into IP. Recalling the definition 
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of such a homomorphism, we must have (dj + v 2 )S = v x S + v 2 S and 

(ay 1 )iS = a.(v x S) for all v 1} v 2 e V and all a e F. The same conditions also 
hold for T. 

We first want to introduce an addition for these elements S and T in 
Horn (V, IV). What is more natural than to define S + T by declaring 
v(S + T) = vS + vT for all v e V? We must, of course, verify that S + T 
is in Horn (V, IV). By the very definition of S + T, if v x , v 2 e V, then 
Oh +^ 2 )^+ T) = (»i +y 2 )^+ (»i +v 2 )T-, since (v 1 +v 2 )S = v l S+v 2 S 
and (dj + v 2 )T = v 1 T + v 2 T and since addition in W is commutative, we 
get (zq + v 2 ) (S + T) = v x S + v x T + v 2 S + v 2 T. Once again invoking 
the definition of S + T the right-hand side of this relation becomes 
*i(S + T) + v 2 (S + T); we have shown that {v x + v 2 )(S + T) = 
v i (S + T) + v 2 (S + T) . A similar computation shows that ( av ) (S + T) = 
a(v(S + T)). Consequently S + T is in Horn (V, W). Let 0 be that 
homomorphism of V into W which sends every element of V onto the zero- 
element of IV; for S e Horn (V, W) let -S be defined by v(-S) = - (vS). 
It is immediate that Horn ( V, IV) is an abelian group under the addition 
defined above. 

Having succeeded in introducing the structure of an abelian group on 
Horn (V, IV), we now turn our attention to defining 2.S for XeF and 
S e Horn (V, IV), our ultimate goal being that of making Horn (V, IV) 
into a vector space over F. For XeF and S e Horn (V, W) we define 
XS by v(XS) = X(vS) for all v e V. We leave it to the reader to show that 
XS is in Horn (V, IV) and that under the operations we have defined, 
Horn (V, W) is a vector space over F. But we have no assurance that 
Horn (V, W) has any elements other than the zero-homomorphism. Be 
that as it may, we have proved r 

LEMMA 4.3.1 Horn (V, W) is a vector space over F under the operations 
described above. 

A result such as that of Lemma 4.3.1 really gives us very little information; 
rather it confirms for us that the definitions we have made are reasonable. 
We would prefer some results about Horn (V, IV) that have more of a 
bife to them. Such a result is provided us in 

THEOREM 4.3.1 If V and W are of dimensions m and n, respectively, over F, 
then Horn ( V, IV) is of dimension mn over F. 

Proof. We shall prove the theorem by explicitly exhibiting a basis of 
Horn (V, W) over F consisting of mn elements. 

Let v x ,..., v m be a basis of V over F and w x ,. . ., w n one for IV over F. 
If v e V then v = X 1 v 1 + • • • + X m v m where X 1} . . ., X m are uniquely de- 
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fined elements of F; define T tj \V -+ IF by vT tj = XiWj. From the point 
of view of the bases involved we are simply letting v k T tj = 0 for k # i 
and VfTij = Wj. It is an easy exercise to see that T tj is in Horn ( V , IF). 
Since i can be any of 1, 2 ,,m and j any of 1,2 ,,n there are mn 
such Tf s. 

Our claim is that these mn elements constitute a basis of Horn ( V , W) 
over F. For, let S e Horn ( V, W ); since v t S e W, and since any element 
in IF is a linear combination over F of w t ,..., w n , v t S = VLi\W x + oc l2 w 2 + 

• • • + a in w n , for some a ll9 a 12 , • • •, a ln in F. In fact, v t S = a il w l + • • • + 
a in w n for i — 1, 2,.. ., m. Consider S 0 = a ll T ll + a 12 T l2 + • • • + 

a ln^ln + a 21^21 + ‘ ‘ ‘ + «2n^2n + ' ' * + a .l^il + ' ‘ ’ + a in^n + ' ‘ ' + 

a mi^mi + ‘ ‘ ‘ + ot mn T mn . Let us compute v k S 0 for the basis vector v k . Now 
V A o = ^(an^i +••■+ ct ml T ml +•••+ <x mn T mn ) = ct n (v k T n ) + 

a l2( V k T l2) + •** + <Xml( V k T ml) + ‘'* + <X m «{ V k T mn)■ Since V k^ij = 0 for 

i # k and v k T kj = this sum reduces to v k S 0 = a kl w t + • • • + CL kn w n , 
which, we see, is nothing but v k S. Thus the homomorphisms S 0 and S agree 
on a basis of V. We claim this forces S 0 — S (see Problem 3, end of this 
section). However S 0 is a linear combination of the T t j s, whence S must 
be the same linear combination. In short, we have shown that the mn 
elements T n , T l2 ,..., T ln ,..., T ml ,. .., T mn span Horn (V, W) over F. 

In order to prove that they form a basis of Horn (F, IF) over F there 
remains but to show their linear independence over F. Suppose that 

^11^11 + ft 12 -^12 + “ ' + PlnTin + * * ‘ + PilT n + ' ' ' + Pi„T in + * ' ‘ + 
P m iT ml + * • • + P m „T mn = 0 with pij all in F. Applying this to v k we get 
0 = ^(^11^11 + “ * + PijTij + • • • + P m „T mn ) = fi kl w l + [$ k 2 w 2 + • • • + 
P k „w n since v k T^ = 0 for i # k and V k T k j = Wj. However, w t , ..., w n 
are linearly independent over F, forcing p kj = 0 for all k and j. Thus the 
T i} are linearly independent over F, whence they indeed do form a basis 
of Horn ( V, IF) over F. 

An immediate consequence of Theorem 4.3.1 is that whenever V # (0) 
and IF # (0) are finite-dimensional vector spaces, then Horn (F, IF) does 
not just consist of the element 0, for its dimension over F is nm > 1. 

Some special cases of Theorem 4.3.1 are themselves of great interest and 
we list these as corollaries. 

COROLLARY 1 i/'dim F V — m then dim F Horn (F, F) = m 2 . 

Proof. In the theorem put F = IF, and so m = n, whence mn = m 2 . 

COROLLARY 2 If dim F V = m then dim F Horn (F, F) = m. 

Proof. As a vector space F is of dimension 1 over F. Applying the 
theorem yields dim F Horn (F, F) = m. 
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Corollary 2 has the interesting consequence that if V is finite-dimensional 
over F it is isomorphic to Horn (V, F), for, by the corollary, they are of 
the same dimension over F, whence by Corollary 4 to Lemma 4.2.4 they 
must be isomorphic. This isomorphism has many shortcomings! Let us 
explain. It depends heavily on the finite-dimensionality of V, for if V is 
not finite-dimensional no such isomorphism exists. There is no nice, formal 
construction of this isomorphism which holds universally for all vector 
spaces. It depends strongly on the specialities of the finite-dimensional 
situation. In a few pages we shall, however, show that a “nice” isomorphism 
does exist for any vector space V into Horn (Horn (V, F), F). 

DEFINITION If V is a vector space then its dual space is Horn (V, F). 

We shall use the notation V for the dual space of V. An element of V 
will be called a linear functional on V into F. 

If V is not finite-dimensional the V is usually too large and wild to be 
of interest. For such vector spaces we often have other additional structures, 
such as a topology, imposed and then, as the dual space, one does not generally 
take all of our Fbut rather a properly restricted subspace. If V is finite-dimen¬ 
sional its dual space V is always defined, as we did it, as all of Horn (V, F). 

In the proof of Theorem 4.3.1 we constructed a basis of Horn (V, W) 
using a particular basis of V and one of W. The construction depended 
crucially on the particular bases we had chosen for V and W, respectively. 
Had we chosen other bases we would have ended up with a different basis 
of Horn (V, W). As a general principle, it is preferable to give proofs, 
whenever possible, which are basis-free. Such proofs are usually referred to 
as invariant ones. An invariant proof or construction has the advantage, 
other than the mere aesthetic one, over a proof or construction using a 
basis, in that one does not have to worry how finely everything depends 
on a particular choice of bases. 

The elements of V are functions defined on V and having their values 
in F. In keeping with the functional notation, we shall usually write 
elements of V as f g, etc. and denote the value on v e V as f ( v ) (rather 
than as vf). 

Let V be a finite-dimensional vector space over F and let z> l5 . .., v n be 
a basis of V; let v t be the element of V defined by = 0 for i # j, 

vi( v i ) ~ an d Vi( a i v i + ‘+ oijVi + * • • + oc„v„) = a f . In fact the 
are nothing but the T^ introduced in the proof of Theorem 4.3.1, for here 
W — F is one-dimensional over F. Thus we know that v t ,... ,v n form a 
basis of V. We call this basis the dual basis of v 1} .. ., v n . If v # 0 e F, by 
Lemma 4.2.5 we can find a basis of the form v 1 — v, v 2 ,.. ., v„ and so 
there is an element in V, namely v 1} such that = vfv) = 1^0. 

We have proved 
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LEMMA 4.3.2 If V is finite-dimensional and v # 0 e V, then there is an 
element f e V such thatf (») ^ 0. 

In fact, Lemma 4.3.2 is true if V is infinite-dimensional, but as we have 
no need for the result, and since its proof would involve logical questions 
that are not relevant at this time, we omit the proof. 

Let v 0 e V, where V is any vector space over F. As/ varies over V, and 
Vq is kept fixed, / (» 0 ) defines a functional on V into F; note that we are merely 
interchanging the role of function and variable. Let us denote this function by T Vq ; 
in other words TJf) = /fa>) for any fe V. What can we say about 
T Vo ? To begin with, T Vo (f + g) = (/ + g)(v 0 ) = f i v o) + S( v o) — 
T V0 (f) + T V0 (g ); furthermore, T^fXf) = (A/)(z> 0 ) = ¥( v o\= ^ T v 0 (f )• 
Thus T Vo is in the dual space of VI We write this space as V and refer to 

it as the second dual of V. A 

Given any element v e V we can associate with it an element T v in V. 
Define the mappingj/r. V -> V by m/j = T v for every v e V. Is p a homo¬ 
morphism of V into V? Indeed it is! For, T v+W (f) = f (v + w) = f (v) + 
f(w ) = T v (f) + T w (f) = (T v + T w )(f), and so T v+W = T v + T w , 
that is, (z; + w)il/ = vfi + w\]j. Similarly for XeF, (Xv)^ = J/). Thus 

,jj defines a homomorphism of V into V. The construction of ^ used no 
basis or special properties of V; it is an example of an invariant construction. 

When is an isomorphism? To answer this we must know when vi \i = 0, 
or equivalently, when T v = 0. But if T v = 0, then 0 = T v (f ) = / (v) 
for all feV. However as we pointed out, without proof, for a general 
vector space, given v 0 there is an f e V with / (z>) 0. We actually 

proved this when V is finite-dimensional. Thus for V finite-dimensional 
(and, in fact, for arbitrary V) i j/ is an isomorphism. However, when V is 
finite-dimensional is an isomorphism onto V ; when V is infinite-dimen¬ 
sional i ]j is not onto. 

If V is finite-dimensional, by the second corollary to Theorem 4.3.1, V 
and V are of the same dimension; similarly^ V and V are of the same dimen¬ 
sion; since \]/ is an isomorphism of V into V, the equality of the dimensions 
forces \]/ to be onto. We have proved 

A 

LEMMA 4.3.3 If V is finite-dimensional, then \j/ is an isomorphism of V onto V. 

We henceforth identify V and V, keeping in mind that this identification 
is being carried out by the isomorphism p. 

DEFINITION If IT is a subspace of V then the annihilator of W, A{W) = 
{f e V If (w) = 0 all w e W). 

We leave as an exercise to the reader the verification of the fact that 
A(W) is a subspace of V. Clearly if U cz W, then A(U) => A{W). 


Sec. 4.3 Dual Spaces 


189 


Let W be a subspace of V, where V is finite-dimensional. If feV let 
f be the restriction of / to W; thus f is defined on W by f (w ) = f ( w) for 
every w e W. Since/e F, clearly f e W. Consider the mapping T:V —> W 
defined by fT = f for fe V. It is immediate that (/ + g)T = fT + gT 
and that (Xf) T — X(fT). Thus T is a homomorphism of V into W. 
What is the kernel of T ? If f is in the kernel of T then the restriction of f 
to W must be 0; that is, f (w) = 0 for all w e W. Also, conversely, if 
f ( w ) = 0 for all w e W then f is in the kernel of T. Therefore the kernel 
of T is exactly A(W). 

We now claim that the mapping T is onto W. What we must show is 
that given any element heW , then h is the restriction of some f E V, that 
is h =/. By Lemma 4.2.5, if w x , . . ., w m is a basis of W then it can be 
expanded to a basis of V of the form w x , , w m , v x , ... ,v r where r + m = 
dim V. Let W x be the subspace of V spanned by v x ,... ,v r . Thus V — 
W © W t . If heW define f e V by: let v e V be written as v = w + w x , 
w E W, w x e W x ; then f ( v) = h(w). It is easy to see that/is in V and that 
J — h. Thus h = fT and so T maps V onto W. Since the kernel of T is 
A(W) by Theorem 4.1.1, fT is isomorphic to V/A(W). In particular they 
have the same dimension. Let m = dim W, n = dim V, and r — dim 
A(W). By Corollary 2 to Theorem 4.3.1, m = dim W and n = dim V. 
However, by Lemma 4.2.6 dim V\A{W) — dim V — dim A(W) — n — r, 
and so m — n — r. Transposing, r — n — m. We have proved 

THEOREM 4.3.2 If V is finite-dimensional and W is a subspace of V, then 
W is isomorphic to VjA(W) and dim A(W) = dim V — dim W. 

COROLLARY A(A(W)) = W. 

Proof. Remember that in order for the corollary even to make sense, 
since W cz V and A (A (W)) cz V, we have identified V with V. Now W cz 
A(A(W)), for if we W then wp = T w acts on V by T w (f ) = f (w) and 
so is 0 for all feA(W). However, dim A(A(W)) = dim V — dim A(W) 
(applying the theorem to the vector space V and its subspace A(W)) so 
that dim A(A(W)) = dim V — dimA(W) = dim V — (dim V — dim W) = 
diir^TT. Since W cz A (A (IV)) and they are of the same dimension, it 
follows that W — A(A(W)). 

Theorem 4.3.2 has application to the study of systems of linear homogeneous 
equations. Consider the system of m equations in n unknowns 



+ 

a 12 x 2 

+ ’ 

• * + 

a ln X n 

= 0 , 

a 21 x l 

+ 

a 22 x 2 

+ * 

• * + 

a 2n X n 

= 0 , 

a ml x l 

+ 

a m2 x 2 

+ • • 

• + 

a mn X n 

= 0 , 
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where the a are in F. We ask for the number of linearly independent 
solutions (x u ..., x n ) there are in to this system. 

In F^ let t/be the subspace generated by the m vectors (a ll5 a 12 j • • • 

(«21> a 22> a 2 n)> • • • > ( a ml> a m2> • • • > a mn) an ^ Suppose that U is of 
dimension r. In that case we say the system of equations is of rank r. 

Let »! = (1, 0, .. ., 0), v 2 = (0, 1,0, .. ., 0), = (0, 0, .. ., 0, 1) 

be used as a basis of F^ and let v u v 2 , ■ ■ ■, v n be its dual basis in F^ n \ 
Any f e F^ is of the form f = x l v l + x 2 v 2 + • • • + x n v n , where the 
x t e F. When is f e A(U)? In that case, since (a n ,. .., a ln ) e U, 

^ = f ( a 115 a i2> • • • 5 a ln ) 

= /(«n»i + • • • + a ln v„) 

= (x l v l + x 2 v 2 + • • * + X„v„)(a n v l + • • • + a ln v n ) 

= X^a^i + ^2^12 + " ‘ ‘ + x n a ln 

since v { (Vj) = 0 for i ^ j and v^Vi) = 1. Similarly the other equations of the 
system are satisfied. Conversely, every solution (* 1? . . ., x n ) of the system 
of homogeneous equations yields an element, x^v^ + • • • + x n v n , in A(U). 
Thereby we see that the number of linearly independent solutions of the 
system of equations is the dimension of A(U), which, by Theorem 4.3.2 is 
n — r. We have proved the following: 

THEOREM 4.3.3 If the system of homogeneous linear equations: 

a n x i + • • • + a ln x n = 0, 
a 2 i x i + • ■ ■ + a 2n x n = 0, 

a ml X l T ’ * ’ T a mn X tt ~ 

where a t j e F is of rank r, then there are n — r linearly independent solutions in 
F^"\ 

COROLLARY If n > m, that is, if the number of unknowns exceeds the number 
of equations, then there is a solution (x u . .., x n ) where not all of x l , . . ., x n are 0. 

Proof. Since U is generated by m vectors, and m < n, r = dim U < 
m < n; applying Theorem 4.3.3 yields the corollary. 

Problems 

1. Prove that A(W) is a subspace of V. 

2. If S is a subset of V let ^4(5) — {f e V\ f(s) = 0 all j e 5}. Prove 
that A(S) = A(L(S)), where L(S) is the linear span of S. 
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3. If S, TeHom (F, W) and v t S = v t T for all elements v t of a basis 
of F, prove that S = T. 

4. Complete the proof, with all details, that Horn (F, W) is a vector 
space over F. 

5. If p denotes the mapping used in the text of V into V, give a complete 
proof that p is a vector space homomorphism of V into V. 

6 . If F is finite-dimensional and v x # v 2 are in F, prove that there is an 
f e V such that f (v x ) # f ( v 2 ). 

7. If W 1 and W 2 are subspaces of F, which is finite-dimensional, describe 
A(W i + W 2 ) in terms of^4(IF 1 ) and A(W 2 ). 

8 . If F is a finite-dimensional and and W 2 are subspaces of F, describe 
A{W i n W 2 ) in terms of A(IV 1 ) and A(W 2 ). 

9. If F is the field of real numbers, find A (IF) where 

(a) W is spanned by (1, 2, 3) and (0, 4, - 1). 

(b) IFis spanned by (0, 0, 1, - 1 ), (2, 1, 1, 0), and (2, 1, 1, -1). 

10. Find the ranks of the following systems of homogeneous linear equations 
over F, the field of real numbers, and find all the solutions. 

(a) + 2* 2 - 3*3 + 4*4 = 0 , 

*! + 3*2 — * 3 = 0 , 

6*! + * 3 -f 2*4 = 0. 

(b) *! + 3*2 +*3=0, 

*! + 4*2 +* 3 = 0 . 

( c ) *! + *2 + * 3 + * 4 + * 5 = 0, 

*! + 2*2 = 0 , 

4*J + 7*2 + *3 + *4 + *5 = 0, 

*2 *3 ' *4 *3 —— 0. 

11. If/ and g are in F such that f (v) =0 implies g(v ) = 0, prove that 
g = Xf for some X 6 F. 

4.4 Inner Product Spaces 

In our discussion of vector spaces the specific nature of F as a field, other 
thajrthe fact that it is a field, has played virtually no role. In this section 
we no longer consider vector spaces F over arbitrary fields F; rather, we 
restrict F to be the field of real or complex numbers. In the first case F 
is called a real vector space, in the second, a complex vector space. 

We all have had some experience with real vector spaces—in fact both 
analytic geometry and the subject matter of vector analysis deal with these. 
What concepts used there can we carry over to a more abstract setting? 
To begin with, we had in these concrete examples the idea of length; 
secondly we had the idea of perpendicularity, or, more generally, that of 
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angle. These became special cases of the notion of a dot product (often 
called a scalar or inner product.) 

Let us recall some properties of dot product as it pertained to the special 
case of the three-dimensional real vectors. Given the vectors v — (x ly x 2 ,x 3 ) 
and w = Jh)? where the x’s andjy’s are real numbers, the dot prod¬ 

uct of v and w, denoted by v • w, was defined as v • w = x 1 y 1 + x 2 y 2 + 
* 3.^3 • Note that the length of v is given by \jv • v and the angle 6 between 
v and w is determined by 

v • w 

cos 9 = j ——— , . 

V v • v V w • w 

What formal properties does this dot product enjoy? We list a few: 

1. v • v > 0 and v • v = 0 if and only if v = 0; 

2. v • w = w • v; 

3. u • (av + j3w) = cl(u • v) + fi(u • w); 

for any vectors u, v, w and real numbers a, /?. 

Everything that has been said can be carried over to complex vector 
spaces. However, to get geometrically reasonable definitions we must make 
some modifications. If we simply define v • w = x x y t + x 2 y 2 + x 3 y 3 for 
v = (*i, * 2 , * 3 ) and w = (j)>i,j)> 2 jJ)> 3 ), where the ,x’s and y’s are complex 
numbers, then it is quite possible that v • v = 0 with v ^ 0; this is illus¬ 
trated by the vector v = (1, i, 0). In fact, v • v need not even be real. If, 
as in the real case, we should want v 1 v to represent somehow the length of 
v, we should like that this length be real and that a nonzero vector should 
not have zero length. 

We can achieve this much by altering the definition of dot product 
slightly. If a denotes the complex conjugate of the complex number a, 
returning to the v and w of the paragraph above let us define v • w = 
x l y l + ^ 2^2 + ^ 3 ^ 3 - For real vectors this new definition coincides with 
the old one; on the other hand, for arbitrary complex vectors v ^ 0, not 
only is v • v real, it is in fact positive. Thus we have the possibility of intro¬ 
ducing, in a natural way, a nonnegative length. However, we do lose 
something; for instance it is no longer true that v • w — w ■ v. In fact the 
exact relationship between these is v • w = w • v. Let us list a few properties 
of this dot product: 

1 . v • w = w • v ; 

2. v • v > 0, and v v = 0 if and only if v = 0; 

3. (a u -f fiv) • w — a {u-w) + (i(vw); 

4. u • (av + /?«;) = a(u -v ) + fi(u • w); 

for all complex numbers a, (3 and all complex vectors u, v, w. 

We reiterate that in what follows F is either the field of real or complex 
numbers. 
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DEFINITION The vector space V over F is said to be an inner product 
space if there is defined for any two vectors u, v e V an element (a, v) in 
F such that 

1. (a, a) = (v; a); 

2. (u, a) > 0 and (u, u ) = 0 if and only if a = 0; 

3. (a u + /la, w) = a(a, w) + /l(a, w); 

for any u, v, w e V and a, /I e F. 

A few observations about properties 1, 2, and 3 are in order. A function 
satisfying them is called an inner product. If F is the field of complex numbers, 
property 1 implies that (u, u) is real, and so property 2 makes sense. Using 
1 and 3, we see that (a, aa + f}w) = (aa + fiw, a) = a (a, a) + f}(w, a) == 
a (a, a) + fi(w, a) = a(a, a) + /J(a, za). 

We pause to look at some examples of inner product spaces. 

Example 4.4.1 In F (n) define, for a = (a l5 . . ., a„) and a = (/l l5 . .., 
P„), ( m 5 *0 = «iJ^i + + • • • + a„/?„. This defines an inner product 

on F (n) . 


Example 4.4.2 In F {2) define for a = (a l3 a 2 ) and a = (/? l5 fi 2 ), (a, a) = 
2a 1 j5 1 + a .^2 + a 2 ^i + a 2 ^ 2 - It is eas Y to verify that this defines an 
inner product on F (2) . 

Example 4.4.3 Let V be the set of all continuous complex-valued 
functions on the closed unit interval [0, 1]. Ify" (t), g(t) e V, define 

(/(0»£(0) = | f(t) g(i) dt. 

We leave it to the reader to verify that this defines an inner product on V. 

For the remainder of this section V will denote an inner product space. 

DEFINITION If a e V then the length of a (or norm of a), written ||a||, is 
defined by ||a|| = yj (a, a). 

LEMMA 4.4.1 If a, a g V and a, P e F then (aa + /la, aa + /la) = 
aa(a, a) + ccfi(u, a) + a/l(a, a) + Pfi(v, a). 

Proof. By property 3 defining an inner product space, (aa + /la, aa + 
/la) = a(a, aa + /la) + /l(a, aa + /la); but (a, aa + /la) = a(a, a) + /J(a, a) 
and (a, aa + /la) = a (a, a) + /?(a, a). Substituting these in the expression 
for (aa + /la, aa + /la) we get the desired result. 
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COROLLARY ||ati|| = |a| \\u\\. 

Proof. ||az/|| 2 = (aw, aw) = aa(u,u) by Lemma 4.4.1 (with v = 0). 
Since oca = |a| 2 and (u, u) = ||m|| 2 , taking square roots yields ||ati|| = 
|a| \\u\\. 

We digress for a moment, and prove a very elementary and familiar 
result about real quadratic equations. 

LEMMA 4.4.2 If a, b, c are real numbers such that a > 0 and aX 2 + 2bX + 
c > Ofor all real numbers X, then b 2 < ac. 

Proof. Completing the squares, 

aX 2 + 2 bX + c = - (aX + b) 2 +(c -- 
a \ t 

Since it is greater than or equal to 0 for all X, in particular this must be 
true for X = -bja. Thus c - {b 2 / a) > 0, and since a > 0 we get b 2 < ac. 

We now proceed to an extremely important inequality, usually known 
as the Schwarz inequality: 

THEOREM 4.4.1 Ifu,ve V then |(u, v)\ < \\u\\ H|. 

Proof. If u = 0 then both (u, v) = 0 and ||ti|| \\v\\ = 0, so that the 
result is true there. 

Suppose, for the moment, that (u, v ) is real and u # 0. By Lemma 
4.4.1, for any real number X, 0 < ( Xu + v, Xu + v) = X 2 (u, u ) + 
2 (u, v)X + (», v) Let a = (u, u), b = (u, v), and c = (», v); for these the 
hypothesis of Lemma 4.4.2 is satisfied, so that b 2 < ac. That is, (u, v) 2 < 
(u, u)(v, v); from this it is immediate that |(m,p)| < ||u|| ||o||. 

If a = {u, v) is not real, then it certainly is not 0, so that u/ct is mean¬ 
ingful. Now, 



and so it is certainly real. By the case of the Schwarz inequality discussed 
in the paragraph above, 






since 
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we get 


1 < 


IM 


whence |a| < ||u|| ||o||. Putting in that a = (u, v) we obtain \{u, y)| < 
||zi|| ||t>||, the desired result. 

Specific cases of the Schwarz inequality are themselves of great interest. 
We point out two of them. 

1. If V — F (n) with (u, v ) = + ■ ■ ■ + a„j8„, where u = (a 1? .. ., a„) 

and v = (j8 1? . . ., /?„), then Theorem 4.4.1 implies that 

l a l^l + ’ ■ ■ + ttnfin | 2 ^ (l a l| 2 + ■ ■ ' + |a„| 2 )(|/?i| 2 + ' ' ■ + l^nl 2 )- 

2. If V is the set of all continuous, complex-valued functions on [0,1] with 
inner product defined by 

(/(0 >g(t)) - J /(0 W) dt > 


then Theorem 4.4.1 implies that 


fit) g(‘i dt 


f‘ \f(t)\ 2 dt f‘ \g{t)\Ut. 
Jo Jo 


The concept of perpendicularity is an extremely useful and important 
one in geometry. We introduce its analog in general inner product spaces. 


DEFINITION If u, v e V then u is said to be orthogonal to v if ( u , v) = 0. 


Note that if u is orthogonal to v then v is orthogonal to u, for ( v , u) = 
(«, v) =0 = 0. 

DEFINITION If IT is a subspace of V, the orthogonal complement of W, 
tV 1 , is defined by W 1 = {x e V\(x, w) = 0 for all w e W}. 

LEMMA 4.4.3 W 1 is a subspace of V. 

Sproof. If a, b e W 1 then for all a, /? e F and all w e W, (eta + j 8b, w) = 
a {a, w) + j 8(b, w) = 0 since a, b e W 1 . 

Note that W n W 1 = (0), for if w e W n W 1 it must be self-orthogonal, 
that is ( w, w) = 0. The defining properties of an inner product space 
rule out this possibility unless w = 0. 

One of our goals is to show that V = W + W 1 . Once this is done, 
the remark made above will become of some interest, for it will imply that 
V is the direct sum of W and W 1 . 
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DEFINITION The set of vectors {zq} in V is an orthonormal set if 

1. Each Vi is of length 1 (i.e., (zq, zq) = 1). 

2. For i # j, ( v t , Vj) = 0. 

LEMMA 4.4.4 If {zq} is an orthonormal set , then the vectors in {zq} are linearly 
independent. If w = oqzq + • • • + a„zq, then CLi = ( w , zq) for i = 1,2 ,.. . ,n. 

Proof. Suppose that oqzq + a 2 zq + • • • + a n v n = 0. Therefore 0 = 
(<*i°i + • • • + Ct n v n , Vi) = afv u Vi) + • • • + a n (v n , zq). Since (zq, zq) = 0 
for j ^ i while (zq, zq) = 1, this equation reduces to oq = 0. Thus the 
Vj s are linearly independent. 

If w = a l v l + • • • + a n v n then computing as above yields ( w, v t ) = oq. 
Similar in spirit and in proof to Lemma 4.4.4 is 

LEMMA 4.4.5 If {zq, ..., zq} is an orthonormal set in V and if w e V, then 
u = w - (w, zq)zq - (w, v 2 )v 2 - • • • - (w, Vi)v t - • • • - (w, v n )v„ is 
orthogonal to each of v t , v 2 , . . ., v n . 

Proof. Computing (w, v t ) for any i < n, using the orthonormality of 
v lf ... ,v n yields the result. 

The construction carried out in the proof of the next theorem is one which 
appears and reappears in many parts of mathematics. It is a basic pro¬ 
cedure and is known as the Gram-Schmidt orthogonalization process. Although 
we shall be working in a finite-dimensional inner product space, the 
Gram-Schmidt process works equally well in infinite-dimensional situations. 


THEOREM 4.4.2 Let V be a finite-dimensional inner product space; then V has 
an orthonormal set as a basis. 


Proof. Let V be of dimension n over F and let zq,..., v„ be a basis of V. 
From this basis we shall construct an orthonormal set of n vectors; by 
Lemma 4.4.4 this set is linearly independent so must form a basis of V. 

We proceed with the construction. We seek n vectors zeq,.. ., w n each 
of length 1 such that for i ^ j, (zeq, zeq) = 0. In fact we shall finally 
produce them in the following form: zeq will be a multiple of zq, zeq will be 
in the linear span of zeq and zq, zeq in the linear span of zeq, zeq, and zq, and 
more generally, zeq in the linear span of zeq, zeq,. . ., zeq_ l5 zq. 

Let 


w, = 


then 


(zeq, zeq) = (^ (zq, zq) 

VIM \Mj IK II 2 


= l, 


2 
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Iphence Hh^H = 1. We now ask: for what value of a is a w 1 + v 2 orthogonal 
j§o ? All we need is that (cav x + v 2 , w x ) = 0, that is a(w u w t ) + 
m> 2 > w i) = Since (w l ,w l ) — 1, a = - (» 2 , »i) will do the trick. Let 
wL = — (v 2 , w l )w l + v 2 ; u 2 is orthogonal to w t ; since v t and v 2 are linearly 

t tdependent, w l and v 2 must be linearly independent, and so u 2 # 0. 

et w 2 = (m 2 /II m 2 II) j then {w^, w 2 ) is an orthonormal set. We continue. 
P»et u 3 = — {v 3 , w x )w x — (v 3 , w 2 )w 2 + v 3 ; a simple check verifies that 
&t 3 , w x ) = (m 3 , vo 2 ) = 0. Since w x , w 2 , and v 3 are linearly independent 
||for w j, vo 2 are in the linear span of v x and v 2 ), u 3 ^ 0. Let w 3 = (m 3 /||m 3 ||) ; 
then {w t , w 2 , w 3 } is an orthonormal set. The road ahead is now clear. 
|§uppose that we have constructed w t , w 2 , . . . , w i9 in the linear span of 
9 1} . . ., v t , which form an orthonormal set. How do we construct the next 
one, w i + i ? Merely put u i + 1 = -(» i + 1 , w 1 )w l - (v i + 1 , w 2 )w 2 -•••- 
(v i+1 , Wi)wi + v i + l . That u i + 1 ^ 0 and that it is orthogonal to each of 
ase> l5 ..., w t we leave to the reader. Put w i + x = (w,- + i/||Wj + i ||)! 
fl In this way, given r linearly independent elements in V, we ean construct 
an orthonormal set having r elements. If particular, when dim V = n, 
from any basis of V we can construct an orthonormal set having n elements. 
This provides us with the required basis for V. 

We illustrate the construction used in the last proof in a concrete case. 
Let F be the real field and let V be the set of polynomials, in a variable x, 
over F of degree 2 or less. In V we define an inner product by: if/>(*), 
g(x) e V, then 


(/>(*), ?(*)) 

Let us start with the basis v l = 
construction used, 


p{x)q{x) dx. 


1, v 2 = x, v 3 = x 2 of V. Following the 


u i = + V 2> 

whicff after the computations reduces to u 2 = 


x, and so 


finally, 


_ V3 


“3 = “ (»3> W \) W l ” ( y 3> w l) W 2 + V 3 = 


198 


Vector Spaces and Modules Ch. 4 


and so 


w 3 = 


“3 


“3 II 



(-1 + 3* 2 ). 


We mentioned the next theorem earlier as one of our goals. We are now 
able to prove it. 


THEOREM 4.4.3 If V is a finite-dimensional inner product space and if W is 
a sub space of V, then V = W + W 1 . More particularly, V is the direct sum of 
W and W 1 . 

Proof. Because of the highly geometric nature of the result, and because 
it is so basic, we give several proofs. The first will make use of Theorem 
4.4.2 and some of the earlier lemmas. The second will be motivated geo¬ 
metrically. 

First Proof. As a subspace of the inner product space V, W is itself an 
inner product space (its inner product being that of V restricted to W). 
Thus we can find an orthonormal set w x ,.. ., w r in W which is a basis of W. 
If v e V, by Lemma 4.4.5, v 0 = v — ( v , w 1 )w 1 — ( v , w 2 )w 2 — • • • — 
(v, w r )w r is orthogonal to each of w u ... ,w r and so is orthogonal to W. 
Thus v 0 e W 1 , and since v = v 0 + ((», w 1 )w 1 + • • • + (», w r )w r ), v e 
W + W L . Therefore V = W + W 1 . Since W n W 1 = (0), this sum is 
direct. 


Second Proof. In this proof we shall assume that F is the field of real 
numbers. The proof works, in almost the same way, for the complex 
numbers; however, it entails a few extra details which might tend to obscure 
the essential ideas used. 

Let v e V; suppose that we could find a vector w 0 e W such that 
Ik — ze^oll ^ Ik — w|| for all w e W. We claim that then (v — w 0 , w) = 0 
for all w e W, that is, v — w 0 e W 1 . 

If w 6 W, then w 0 + w e W, in consequence of which 


(y - w 0 , v - w 0 ) < (v - ( w 0 + w),v - ( w 0 + «;)). 

However, the right-hand side is ( w , w) + (v — w 0 , v — w 0 ) — 2(v — w 0 , w), 
leading to 2{v — w 0 , w) < (w, w) for all weW. If m is any positive 
integer, since wfm e IT we have that 


2 

m 


(v - w 0 , w) 




< w w 


~2 ( W > W )’ ■ 
m 


and so 2(v — w 0 , w ) < (1 jm)(w, w) for any positive integer m. However, 
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(1 lm)(w, w) -> 0 as m -► oo, whence 2{y - w 0 , w) < 0. Similarly, ~we W, 
and so 0 < — 2(v — w 0 , w) = 2{v — w 0 , — w) < 0, yielding (v — w 0 , w) 
= 0 for all w e IV. Thus v — w 0 e IV 1 ; hence v e w 0 + W 1 cz W + W 1 . 

, To finish the second proof we must prove the existence of a w 0 e W 
such that ||» — w 0 || < \\v — w\\ for all w e W. We indicate sketchily two 
ways of proving the existence of such a w 0 . 

Let u u • • • j u k be a basis of W; thus any w e W is of the form w — 

: + • • • + 2, k u k . Let = (u;, Uj) and let = {v, u t ) for v e V. Thus 

(V - U), V - w) = 0 - Vi - • •. - X k u k , V - 2 lWl -- X k w k ) = 

{v,v) — 'EWjPij — 2This quadratic function in the A’s is nonnegative 
and so, by results from the calculus, has a minimum. The A’s for this 
minimum, A/ 0) , X 2 {0 \ ..., X k (0) give us the desired vector w 0 = 
Xl (0) “i + • • • + A k 0) u k in W. 

A second way of exhibiting such a minimizing w is as follows. In V define 
a metric { by C(x,y) = \\x — y \\; one shows that £ is a proper metric on V, 
and V is now a metric space. Let S = {w e W \ \\v — w\\ < ||y||}; in 
this metric S is a compact set (prove!) and so the continuous function 
/(«0 = ||v — w\\ defined for weS takes on a minimum at some point 
w 0 g S. We leave it to the reader to verify that w 0 is the desired vector 
satisfying ||z> — w 0 || < ||» — a;|| for all we W. 

COROLLARY If V is a finite-dimensional inner product space and W is a sub space 
of V then (IT 1 ) 1 - W. 

Proof. If w e W then for any u e W 1 , (w, u) = 0, whence W cz 
( IV 1 ) 1 . Now V = IV + IV 1 and V = W 1 + (IT 1 ) 1 ; from these we get, 
since the sums are direct, dim (W) — dim ((IT 1 ) 1 ). Since IT c (JT 1 ) 1 
and is of the same dimension as (IT 1 ) 1 , it follows that IT = (IT 1 ) 1 . 


Problems 

In all the problems V is an inner product space over F. 

1. IfF is the real field and V is F (3) , show that the Schwarz inequality 
implies that the cosine of an angle is of absolute value at most 1. 

2. If F is the real field, find all 4-tuples of real numbers {a, b, c, d) such 

that for u = (a l3 a 2 ), v = f} 2 ) eF (2 \ (u, v) = aoc^ + bu 2 p 2 + 

cci 1 P 2 + da. 2 P X defines an inner product on F (2) . 

3. In V define the distance £(m, v) from u to v by £(w, v) = \\u — v\\. Prove 
that 

(a) £(m, v) > 0 and £(m, v) = 0 if and only if u — v. 

(b) C(«, v) = C(», u). 

(c) £(m, v) < C(u, w) + £(w, v) (triangle inequality). 
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4. If {w x ,. . ., w m ) is an orthonormal set in V, prove that 

m 

Y \{w u v)\ 2 < ||y|| 2 for any v e V. 
i= 1 

(Bessel inequality) 

5. If V is finite-dimensional and if {w u . . ., w m } is an orthonormal set in 
V such that 


Y IK’ y )i 2 = IMI 2 

i = 1 

for every v e V, prove that {w x ,. .., w m } must be a basis of V. 

6. If dim V = n and if {w 1 ,... ,w m } is an orthonormal set in V, prove 
that there exist vectors w m + l ,. .., w„ such that {w u . . ., w m , w m + l , 
. . ., w n } is an orthonormal set (and basis of V). 

7. Use the result of Problem 6 to give another proof of Theorem 4.4.3. 

8. In V prove the parallelogram law: 

IIw + v\\ 2 + ||» - v\\ 2 = 2(||m|| 2 + ||z>|| 2 ). 


Explain what this means geometrically in the special case V ~ 
where F is the real field, and where the inner product is the usual dot 
product. 

9. Let V be the real functions y — f (x) satisfying d 2 yfdx 2 + 9y = 0. 

(a) Prove that V is a two-dimensional real vector space. 

(b) In V define (y, z) — J yz dx. Find an orthonormal basis in V. 

10. Let V be the set of real functions jy = f {x) satisfying 


dx 3 dx 2 


+ 11 d l - 6y = 0. 
dx 


(a) Prove that V is a three-dimensional real vector space. 

(b) In V define 

f° 

(u, v) = uv dx. 

J — 00 

Show that this defines an inner product on V and find an ortho¬ 
normal basis for V. 

11. If IF is a subspace of V and if v e V satisfies (v, w) + (w, v) < (w, w) 
for every w e W, prove that (y, w) = 0 for every w e W. 

12. If V is a finite-dimensional inner product space and if/" is a linear 
functional on V (i.e., f e V ), prove that there is a Uq e V such that 
f (v) = ( v, Uq) for all v e V. 
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4.5 Modules 

The notion of a module will be a generalization of that of a vector space; 
instead of restricting the scalars to lie in a field we shall allow them to be 
elements of an arbitrary ring. 

This section has many definitions but only one main theorem. However 
the definitions are so close in spirit to ones already made for vector spaces 
that the main ideas to be developed here should not be buried in a sea of 
definitions. 

DEFINITION Let R be any ring; a nonempty set M is said to be an 
R-module (or, a module over R ) if M is an abelian group under an operation 
+ such that for every r e R and me M there exists an element rm in M 
subject to 

1. r(a + b) = ra + rb; 

2. r{sa) = (rs)a; 

3. (r + s)a = ra + sa 

for all a, b e M and r, s e R. 

If R has a unit element, 1, and if 1 m = m for every element m in M, then 
M is called a unital i?-module. Note that if R is a field, a unital Z?-module 
is nothing more than a vector space over R. All our modules shall be unital ones. 

Properly speaking, we should call the object we have defined a left R- 
module for we allow multiplication by the elements of R from the left. 
Similarly we could define a right R-module. We shall make no such left-right 
distinction, it being understood that by the term i?-module we mean a .left 
jR-module. 

Example 4.5.1 Every abelian group G is a module over the ring of 
integers! 

For, write the operation of G as + and let na, for a e G and n an integer, 
have the meaning it had in Chapter 2. The usual rules of exponents in 
abelian groups translate into the requisite properties needed to make of G 
a module over the integers. Note that it is a unital module. 

Example 4.5.2 Let R be any ring and let M be a left-ideal of R. For 
r e R, me M, let rm be the product of these elements as elements in R. 
The definition of left-ideal implies that rm e M, while the axioms defining a 
ring insure us that M is an i?-module. (In this example, by a ring we mean 
an associative ring, in order to make sure that r(sm) = ( rs)m .) 

Example 4.5.3 The special case in which M = R\ any ring R is an 
^-module over itself. 
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Example 4.5.4 Let R be any ring and let A be a left-ideal of R. Let 
M consist of all the cosets, a + A, where a e R, of A in R. 

In M define (a + A) + (b + A) = {a + b) + A and r{a + A) = ra + A. 
M can be shown to be an i2-module. (See Problem 2, end of this section.) 
M is usually written as R — A (or, sometimes, as RjX) and is called the 
difference (or quotient) module of R by A. 

An additive subgroup A of the i?-module M is called a submodule of M 
if whenever r e R and a e A, then ra e A. 

Given an i2-module M and a submodule A we could construct the quotient 
module MfA in a manner similar to the way we constructed quotient 
groups, quotient rings, and quotient spaces. One could also talk about 
homomorphisms of one i?-module into another one, and prove the appro¬ 
priate homomorphism theorems. These occur in the problems at the end 
of this section. 

Our interest in modules is in a somewhat different direction; we shall 
attempt to find a nice decomposition for modules over certain rings. 

DEFINITION If M is an ^-module and if M u . .., M s are submodules 
of M, then M is said to be the direct sum of .. ., M s if every element 
me M can be written in a unique manner as m = m t + rn 2 + * * * + rn s 
where m 1 e M u m 2 e M 2 ,..., m s e M s . 

As in the case of vector spaces, if M is the direct sum of M u . .., M s then 
M will be isomorphic, as a module, to the set of all s- tuples, (m t ,..., m s ) 
where the ith component m t is any element of where addition is com¬ 
ponentwise, and where r(m t ,.. ., m s ) = (rm t , rm 2 ,.. ., rm s ) for r e R- 
Thus, knowing the structure of each M- would enable us to know the 
structure of M. 

Of particular interest and simplicity are modules generated by one 
element; such modules are called cyclic. To be precise: 

DEFINITION An Z?-module M is said to be cyclic if there is an element 
m 0 e M such that every m e M is of the form m = rm 0 where r e R. 

For R, the ring of integers, a cyclic Z?-module is nothing more than a 
cyclic group. 

We still need one more definition, namely, 

DEFINITION An Z?-module M is said to be finitely generated if there exist 
elements a u • • •, a n e M such that every m in M is of the form m = r x a x + 
r 2 a 2 + • • • + r„a n . 
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With all the needed definitions finally made, we now come to the theorem 
which is the primary reason for which this section exists. It is often called 
the fundamental theorem on finitely generated modules over Euclidean rings 
In it we shall restrict R to be a Euclidean ring (see Chapter 3, Section 3.7) • 
however the theorem holds in the more general context in which R is any 
principal ideal domain. y 

THEOREM 4.5.1 Let R be a Euclidean ring; then any finitely generated R- 
module, AL, is the direct sum of a finite number of cyclic submodules. 

Proof. Before becoming involved with the machinery of the proof, let us 
see what the theorem states. The assumption that M is finitely generated 
tells us that there is a set of elements a l} . . ., a n e M such that every ele¬ 
ment in M can be expressed in the form r^ + r 2 a 2 + • ■ • + r „a„, where 
the g R. The conclusion of the theorem states that when R is properly 
conditioned we can, in fact, find some other set of elements b l3 . . ., b in 
M such that every element m e M can be expressed in a unique fashion 
as m = s l b l + • • • + s q b q with s t e R. A remark about this uniqueness; it 
does not mean that the j. are unique, in fact this may be false; it merely 
states that the elements sfa are. That is, if m = +... + *$ an d 

m = s'fi + • • * + s' q b q we cannot draw the conclusion that s\ L s[, 
s 2 ~ s 2> * • • j s q = s 'q> but rather, we can infer from this that s,b , = 
s i b i, - -, s q b q = s' q b q . 

Another remark before we start with the technical argument. Although 
the theorem is stated for a general Euclidean ring, we shall give the proof in 
all its detail only for the special case of the ring of integers. At the end we 
shall indicate the slight modifications needed to make the proof go through 
for the more general setting. We have chosen this path to avoid cluttering 
up the essential ideas, which are the same in the general case, with some 
technical niceties which are of no importance. 

Thus we are simply assuming that M is an abelian group which has a 
finite-generating set. Let us call those generating sets having as few elements 
as possible minimal generating sets and the number of elements in such a 
minimal generating set the rank of M. 

/ Our proof now proceeds by induction on the rank of M. 

If the rank of M is 1 then M is generated by a single element, hence it is 
cyclic; in this case the theorem is true. Suppose that the result is true for all 
abelian groups of rank q - 1, and that M is of rank q. 

Given any minimal generating set a l3 . . ., a q of M, if any relation of the 
orm n jflj + n 2 a 2 + • • • + Hq a q = 0 (n l3 . . ., n q integers) implies that 
= n 2 a 2 = ■ • • = n q a q = 0, then M is the direct sum of M l} M 2 ,. . ., M q 
where each M t is the cyclic module (i.e., subgroup) generated by a i3 and 
so we would be done. Consequently, given any minimal generating set 



204 


Vector Spaces and Modules Ch. 4 


b x ,. . ., b q of M, there must be integers q,. . ., r q such that r x b x + • • • + 
r q b q = 0 and in which not all of r x b x , r 2 b 2 , • • •, r q b q are 0. Among all 
possible such relations for all minimal generating sets there is a smallest 
possible positive integer occurring as a coefficient. Let this integer be q 
and let the generating set for which it occurs be a x , . . ., a q . Thus 

s x a x + s 2 a 2 + • • • + s q a q = 0. (1) 

We claim that if r x a x + • • • + r q a q = 0, then q \ q; for q = ms x + t, 
0 < t < q, and so multiplying Equation (1) by m and subtracting from 
q<q T • ' • T v q d q = 0 leads to ta x (j 2 ^^ 2)^2 T ‘ ‘ ’ T (j q ^ , ^ q )^ q 

0; since t < q and q is the minimal possible positive integer in such a 
relation, we must have that t = 0. 

We now further claim that q | q for i = 2,. . ., q. Suppose not; then 
q X s 2 , say, so s 2 = m 2 s l + t, 0 < t < q. Now a\ = <q + ^ 2 a 2 > a 2 > • • • > a q 
also generate M , yet qa^ + ta 2 + s 3 q 3 + • • • + s q a q = 0; thus t occurs 
as a coefficient in some relation among elements of a minimal generating 
set. But this forces, by the very choice of q, that either t = 0 or t > q. 
We are left with t = 0 and so q | s 2 . Similarly for the other q. Let us 
write q = m i s i . 

Consider the elements a* = a x + m 2 a 2 + m 3 a 3 +■’•• + Tn q a q , a 2 ,..., a q . 
They generate M; moreover, qc* = qq + m 2 s x a 2 + • • • + m q s l a q = 
qflj + s 2 a 2 + • • • + s q a q = 0. If r x a* + r 2 a 2 + • • • + r q a q = 0, substitut¬ 
ing for a*, we get a relation between <q,. . ., a q in which the coefficient of 
a x is q; thus q | q and so r x a* = 0. If M x is the cyclic module generated 
by a* and if M 2 is the submodule of M generated by a 2 ,. . ., a q , we have 
just shown that M x n M 2 = (0). But M x + M 2 — M since a*, a 2 ,..., a q 
generate M. Thus M is the direct sum of M x and M 2 . Since M 2 is generated 
by a 2 ,..., a q , its rank is at most q — 1 (in fact, it is q — 1), so by the 
induction M 2 is the direct sum of cyclic modules. Putting the pieces together 
we have decomposed M into a direct sum of cyclic modules. 

COROLLARY Any finite abelian group is the direct product (sum) of cyclic 
groups. 

Proof. The finite abelian group G is certainly finitely generated; in 
fact it is generated by the finite set consisting of all its elements. Therefore 
applying Theorem 4.5.1 yields the corollary. This is, of course, the result 
proved in Theorem 2.14.1. 

Suppose that R is a Euclidean ring with Euclidean function d. We 
modify the proof given for the integers to one for R as follows: 

1. Instead of choosing q as the smallest possible positive integer occurring 

in any relation among elements of a generating set, pick it as that element 

of R occurring in any relation whose rf-value is minimal. 
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2. In the proof that s x \ r x for any relation r x a x + •■• + r a = 0, the 
only change needed is that r x = ms x + t where either 

t = 0 or d{t) < d(s x ); 

the rest goes through. Similarly for the proof that s x | s t . 

Thus with these minor changes the proof holds for general Euclidean 
rings, whereby Theorem 4.5.1 is completely proved. 

Problems 

1* Verify that the statement made in Example 4.5.1 that every abelian 
group is a module over the ring of integers is true. 

2. Verify that the set in Example 4.5.4 is an ^-module. 

3. Suppose that R is a ring with a unit element and that M is a module 
over R but is not unital. Prove that there exists an m ± 0 in M such 
that rm = 0 for all r e R. 

Given two ^-modules M and N then the mapping T from M into N is 
called a homomorphism (or R-homomorphism or module homomorphism) if 

1. (m x + m 2 )T = m x T + m 2 T; 

2. (i m x )T = r(m x T) ; 

for all m x , m 2 e M and all r e R. 

4. If T is a homomorphism of M into N let K{T) = {x e M | xT = 0}. 
Prove that K{T) is a submodule of M and that I{T ) = {xT\ x e M } 
is a submodule of N. 

5. The homomorphism T is said to be an isomorphism if it is one-to-one. 
Prove that T is an isomorphism if and only if K(T) = (0). 

6. Let M, N, Q be three /^-modules, and let T be a homomorphism of 
M into N and S a homomorphism of N into Q. Define TS:M -> Q 
by m(TS) = ( mT)S for any me M. Prove that TS is an i^-homo- 
morphism of M into Q and determine its kernel, ^(TS). 

7. If M is an ^-module and A is a submodule of M, define the quotient 
module M/A (use the analogs in group, rings, and vector spaces as a 

''guide) so that it is an ^-module and prove that there is an ^-homo¬ 
morphism of M onto M/A. 

8. If T is a homomorphism of M onto N with K{T) = A, prove that N 
is isomorphic (as a module) to M/A. 

9. If A and B are submodules of M prove 

(a) A n B is a submodule of M. 

(b) A + B = {a -|- b | a e A, b e B} is a submodule of M. 

(c) (A + B)/B is isomorphic to Al [A n B). 
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10. An /2-module M is said to be irreducible if its only submodules are ( 0 ) 
and M. Prove that any unital, irreducible /2-module is cyclic. 

11. If M is an irreducible /2-module, prove that either M is cyclic or that 
for every m e M and r e R, rm = 0. 

*12. If M is an irreducible /2-module such that rm ^ 0 for some reR 
and m e M, prove that any /2-homomorphism T of M into M is either 
an isomorphism of M onto M or that mT = 0 for every m e M. 

13. Let M be an /2-module and let E(M) be the set of all /2-homomorphisms 
of M into M. Make appropriate definitions of addition and multi¬ 
plication of elements of E(M ) so that E{M) becomes a ring. (Hint: 
imitate what has been done for Horn (V, V), V a vector space.) 

*14. If M is an irreducible /2-module such that rm ^ 0 for some reR 
and m e M, prove that E ( M) is a division ring. (This result is known 
as Schur’s lemma.) 

15. Give a complete proof of Theorem 4.5.1 for finitely generated modules 
over Euclidean rings. 

16. Let M be an i£-module; if me M let X(m) = {xeR\xm = 0}. 
Show that X(m) is a left-ideal of R. It is called the order of m. 

17. If X is a left-ideal of R and if M is an /2-module, show that for me M, 
hn — {xm | x e X} is a submodule of M. 

* 18. Let M be an irreducible /2-module in which rm ^ 0 for some reR 
and m e M. Let m 0 ^ 0 e M and let A(m 0 ) = {x e R \ xm 0 = 0}. 

(a) Prove that X(m 0 ) is a maximal left-ideal of R (that is, if A is a 
left-ideal of R such that R =) X => A(m 0 ), then X = R or X = 
X(m 0 )). 

(b) As /2-modules, prove that M is isomorphic to R — X(m 0 ) (see 
Example 4.5.4). 

Supplementary Reading 

Halmos, Paul R., Finite-Dimensional Vector Spaces, 2nd ed. Princeton, N.J.: D. Van 
Nostrand Company, Inc., 1958. 



Fields 


In our discussion of rings we have already singled out a special class 
which we called fields. A field, let us recall, is a commutative ring 
with unit element in which every nonzero element has a multiplicative 
inverse. Put another way, a field is a commutative ring in which we 
can divide by any nonzero element. 

Fields play a central role in algebra. For one thing, results about 
them find important applications in the theory of numbers. For 
another, their theory encompasses the subject matter of the theory., of 
equations which treats questions about the roots of polynomials. 

In our development we shall touch only lightly on the field of 
algebraic numbers. Instead, our greatest emphasis will be on aspects 
of field theory which impinge on the theory of equations. Although 
we shall not treat the material in its fullest or most general form, we 
shall go far enough to introduce some of the beautiful ideas, due to 
the brilliant French mathematician Evariste Galois, which have 
^served as a guiding inspiration for algebra as it is today. 

5.1 Extension Fields 

In this section we shall be concerned with the relation of one field to 
another. Let F be a field; a field K is said to be an extension of F if K 
contains F. Equivalently, K is an extension of F if F is a subfield of K. 
Throughout this chapter F will denote a given field and K an extension of F. 

As was pointed out earlier, in the chapter on vector spaces, if K is 
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an extension of F, then, under the ordinary field operations in K,K is a vector 
space over F. As a vector space we may talk about linear dependence, 
dimension, bases, etc., in K relative to F. 

DEFINITION The degree of K over F is the dimension of K as a vector 
space over F. 

We shall always denote the degree of K over F by [K:F]. Of particular 
interest to us is the case in which [ K:F ] is finite, that is, when K is finite¬ 
dimensional as a vector space over F. This situation is described by saying 
that K is a finite extension of F. 

We start off with a relatively simple but, at the same time, highly effective 
result about finite extensions, namely, 

THEOREM 5.1.1 If L is a finite extension of K and if K is a finite extension of 
F, then L is a finite extension of F. Moreover , [L:F] = [L:K][K:F]. 

Proof. The strategy we employ in the proof is to write down explicitly 
a basis of L over F. In this way not only do we show that L is a finite 
extension of F, but we actually prove the sharper result and the one which 
is really the heart of the theorem, namely that [ L:F ] = [L:K][K:F]. 

Suppose, then, that [ L:K ] = m and that [K:F] = n. Let v l ,...,v m 
be a basis of L over K and let . . ., w n be a basis of K over F. What 
could possibly be nicer or more natural than to have the elements v t Wj, 
where * = 1, 2,. * ., m, j = 1, 2,. . ., n, serve as a basis of L over F? 
Whatever else, they do at least provide us with the right number of elements. 
We now proceed to show that they do in fact form a basis of L over F. 
What do we need to establish this? First we must show that every element 
in L is a linear combination of them with coefficients in F, and then we 
must demonstrate that these mn elements are linearly independent over F. 

Let t be any element in L. Since every element in L is a linear combination 
of v u . . ., v m with coefficients in K, in particular, t must be of this form. 
Thus t = k l v 1 + • • • + k m v m , where the elements k u . . ., k m are all in K. 
However, every element in K is a linear combination of w l} ... ,w n with 
coefficients in F. Thus k l = fnW t + • • • + fi„w„, • • • ,k i = ft + ' ' ' + 
f in w„, ■ • • ,k m = f m \W\ + • • ; + fmnWn, where every/y is in F. 

Substituting these expressions for k u . . . ,k m into t = + • • • + k m v m , 

We Obtain t = {f li W l + * • ■ + fu l W n )v l + • * • + (f m l w l + ' * ‘ +fmn W n) V m 

Multiplying this out, using the distributive and associative laws, we finally 
arrive at t = f 11 v l w l + • • • + fi„ViW n + • ■ • + fijV{Wj + • • • + f m „ v m w n- 
Since the fij are in F, we have realized i as a linear combination over F of 
the elements v^j. Therefore, the elements v t Wj do indeed span all ofZ- over 
F, and so they fulfill the first requisite property of a basis. 
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We still must show that the elements v t Wj are linearly independent over F. 

Suppose that f 11 v l w l + • • • + f Xn v x w n + • • • + f ij v i w j +-b f mn v m w n = 0, 

, where th ef tj are in F. Our objective is to prove that each /.. = 0. Re- 
ygroupmg the above expression yields + "" • + f ln w n )v l +••• + 

Mfn w 1 + • • • + + • • • + + • • • + f mn w n )v m = 0. 

Since the w t are in K, and since K => F, all the elements k t = f n w l + • • ■ 
fin w n are in K. Now k l v l + • • • + k m v m = 0 with k 1 ,...,k m eK. But, 
&y assumption, v l ,. . ., v m form a basis of L over K, so, in particular they 
must be linearly independent over K. The net result of this is that = 
k 2 = - * - = k m = 0. Using the explicit values of the k h we get 

fn w i + • * • + f in w n = 0 for i = 1,2,..., m. 

But now we invoke the fact that the are linearly independent over F\ 

f this yields that each — 0. In other words, we have proved that the 
I *>i w j are linearly independent over F. In this way they satisfy the other 
requisite property for a basis. 

| We have now succeeded in proving that the mn elements v t w. form a 
basis of L over F. Thus [L:F] = mn; since m = [L:K] and n = [/C:F] 
1 we have obtained the desired result [L:F] = [L:K][K :F]. 

® Suppose that L, K, F are three fields in the relation L =j K => F and, 
^"Suppose further that [T:F] is finite. Clearly, any elements in L linearly 

independent over K are, all the more so, linearly independent over F. 

Thus the assumption that [L:F] is finite forces the conclusion that \L\K~\ 
| “ finite. Also, since K is a subspace of L, [^:F] is finite. By the theorem, 

: [Z, : F] = :F], whence | [T:F]. We have proved the 

f COROLLARY If L is a finite extension of F and K is a subfield of L whtih 
fi contains F, then [K\F] | [L:F]. 

l Thus, for instance, if [ L:F ] is a prime number, then there can be no 
1 fi f lds properly between F and L. A little later, in Section 5.4, when we 
| discuss the construction of certain geometric figures by straightedge and 
, compass, this corollary will be of great significance. 

I DEFliyTION An element a e K is said to be algebraic over F if there exist 
elements a 0 , a 1} .. ., a„ in F, not all 0, such that a 0 a" + aja" - 1 + • • • + 

If the polynomial q(x) e ^[x], the ring of polynomials in x over F, and 
q(x) = P 0 x m + p { x m 1 + • • • + P m , then for any element b e K, by q{b ) 
sha11 mean the element (i 0 b m + pfi m ~ 1 + • • • + p m i n K. In the ex¬ 
pression commonly used, q(b) is the value of the polynomial q(x) obtained 
oy substituting b for *. The element b is said to satisfy q(x) if q{b) = 0. 
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In these terms, a e K is algebraic over F if there is a nonzero polynomial 
p(x) G -F[x] which a satisfies, that is, for which p{a) = 0. 

Let K be an extension of F and let a be in K. Let Ji be the collection of 
all subfields of K which contain both F and a. Ji is not empty, for K itself 
is an element of Ji. Now, as is easily proved, the intersection of any number 
of subfields of K is again a subfield of K. Thus the intersection of all those 
subfields of K which are members of Ji is a subfield of K. We denote this 
subfield by F(a). What are its properties? Certainly it contains both F 
and a, since this is true for every subfield of K which is a member of Ji . 
Moreover, by the very definition of intersection, every subfield of K in JI 
contains F(a), yet F(a) itself is in Ji. Thus F(a) is the smallest subfield of K 
containing both F and a. We call F(a) the subfield obtained by adjoining a to F. 

Our description of F(a), so far, has been purely an external one. We now 
give an alternative and more constructive description of F(a). Consider all 
these elements in K which can be expressed in the form /? 0 + f} t a + • • • + /? s a s ; 
here the /Ts can range freely over F and s can be any nonnegative integer. 
As elements in K, one such element can be divided by another, provided 
the latter is not 0. Let U be the set of all such quotients. We leave it as 
an exercise to prove that U is a subfield of K. 

On one hand, U certainly contains F and a , whence U 3 F(a). On 
the other hand, any subfield of K which contains both F and a, by virtue 
of closure under addition and multiplication, must contain all the elements 
Po + Pi a + ’ ’ ’ + P s aS where each /? { g F. Thus F(a) must contain all 
these elements; being a subfield of K, F(a) must also contain all quotients 
of such elements. Therefore, F(a ) 3 U. The two relations U cz F(a), 
U 3 F{a) of course imply that U = F(a). In this way we have obtained 
an internal construction of F(a), namely as U. 

We now intertwine the property that a g K is algebraic over F with 
macroscopic properties of the field F(a) itself. This is 

THEOREM 5.1.2 The element aeK is algebraic over F if and only if F(a ) 
is a finite extension of F. 

Proof. As is so very common with so many such “if and only if” pro¬ 
positions, one-half of the proof will be quite straightforward and easy, 
whereas the other half will be deeper and more complicated. 

Suppose that F{a) is a finite extension of F and that [F(a):F] = rn. 
Consider the elements 1 , a, a 2 a m ; they are all in F(a) and are m + 1 
in number. By Lemma 4.2.4, these elements are linearly dependent over 
F. Therefore, there are elements a 0 , a 1? . . ., a m in F, not all 0, such that 
a 0 l + a t a + a 2 « 2 + • • • + CL m a m = 0. Hence a is algebraic over F and 
satisfies the nonzero polynomial p(x) = a 0 + oqx + • • • + ci m x m ' in F\x\ 
of degree at most m = [-F(a) :F]. This proves the “if” part of the theorem- 

Now to the “only if” part. Suppose that a in A" is algebraic over F. By 
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•fassumption, a satisfies some nonzero polynomial in F[x\\ let p{x) be a 
| polynomial in F[x ] of smallest positive degree such that p(a) = 0. We 
gelaim that p{x) is irreducible over F. For, suppose that p{x) = f{x)g(x), 
Inhere /(*), g(x) e F[x\ ; then 0 = p{a) = f( a )g(a) (see Problem 1) and, 
jpfince/(a) and g(a) are elements of the field K, the fact that their product 
lli 0 forces f (a) = 0 or g(a) = 0. Since p(x) is of lowest positive degree 
Pufith p(a) = 0, we must conclude that one of deg f(x) > deg p(x) or 
Ipeg^M > degp(x) must hold. But this proves the irreducibility of p(x). 

We define the mapping from F[x\ into F(a) as follows. For any 

#(*) e h{x)\p = h(a). We leave it to the reader to verify that \j/ is a 

ring homomorphism of the ring F[x ] into the field F(a) (see Problem 1). 
What is V, the kernel of i//? By the very definition of i/q V = 

(A(x) e F[x ] | h(a) = 0}. Also, p{x) is an element of lowest degree in the 

ideal V of.F|>]. By the results of Section 3.9, every element in Fis a multiple 
of/>(*), and since p{x) is irreducible, by Lemma 3.9.6, V is a maximal ideal 
| of^M- By Theorem 3.5.1, F[x]/V is a field. Now by the general homo- 
i morphism theorem for rings (I heorem 3.4.1), F[x]/V is isomorphic to the 
'(image of F[x\ under I jj. Summarizing, we have shown that the image of 
F[x\ under \j/ is a subfield of F(a). This image contains x\jj = a and, for 
|* ver y a eF , a\j/ = a. Thus the image of F[x] under \Jj is a subfield of 
|pM which contains both F and a; by the very definition of F(a) we are 
forced to conclude that the image of F[x] under [// is all of F(a). Put more 
Succinctly, F[x]/V is isomorphic to F(a). 

Now 5 V = (P( x ))> the ideal generated by p(x); from this we claim that 
|the dimension of F[x]/V, as a vector space over F, is precisely equal to 
r 'ricg p(x') (see Problem 2). In view of the isomorphism between F[x]lV and 
I F{a) we obtain the fact that [F(a):F] = deg/>(x). Therefore, [F(a):F] is 
certainly finite; this is the contention of the “only if” part of the theorem. 

• Note that we have actually proved more, namely that [F(a) :F] is equal to 
the degree of the polynomial of least degree satisfied by a over F. 

; ' The proof we have just given has been somewhat long-winded, but 
(deliberately so. The route followed contains important ideas and ties in 
. results and concepts developed earlier with the current exposition. No part 
^f mathematics is an island unto itself. 

We now redo the “only if” part, working more on the inside of F(a). 
Phis reworking is, in fact, really identical with the proof already given; the 
Onstituent pieces are merely somewhat differently garbed. 

Again let p(x) be a polynomial over F of lowest positive degree satisfied 
k y a. Such a polynomial is called a minimal polynomial for a over F. We 
nay assume that its coefficient of the highest power of x is 1, that is, it is 
nonic; in that case we can speak of the minimal polynomial for a over F 
5r any two minimal, monic polynomials for a over F are equal. (Prove!) 
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Suppose that p(x) is of degree n; thus p(x) = x" + a x x n ~ 1 + ■ ■ • + a n 
where the oq are in F. By assumption, a" + oqa" -1 + • ■ ■ + a„ = 0, 

whence a" = -oqa" -1 - a 2 a n ~ 2 -a„. What about a" +1 ? From 

the above, a n+1 = -a x a n - a 2 a n ~ 1 -- a n a; if we substitute the 

expression for a n into the right-hand side of this relation, we realize a" +1 
as a linear combination of the elements 1, a ,.. ., a n 1 over F. Con¬ 
tinuing this way, we get that a n+k , for k >: 0, is a linear combination over 
Fof 1, a, a 2 ,..., a"- 1 . 

Now consider T = {($ 0 + P x a + • ■ ■ + 1 I • • • > Pn-i G ^}- 

Clearly, T is closed under addition; in view of the remarks made in the 
paragraph above, it is also closed under multiplication. Whatever further 
it may be, T has at least been shown to be a ring. Moreover, T contains 
both F and a. We now wish to show that T is more than just a ring, that 
it is, in fact, a field. 

Let 0 # u = ft, + p x a + • * • + P„-ia n ~ 1 be in T and let h(x) =fi 0 + 
p iX + ... + eF[x\. Since u # 0, and u = h{a), we have that 

h(a) 7 ^ 0, whence p(x) \ h{x). By the irreducibility of p(x), p{x) and h(x) 
must therefore be relatively prime. Hence we can find polynomials j(*) 
and t(x) in F\x\ such that p(x)s(x) + h(x)t(x) = 1. But then 1 = 
p(a)s(a) + h(a)t(a) = h(a)t(a), since p{a) = 0; putting into this that 
u — h(a), we obtain ut(a) — 1. The inverse of u is thus t(a); in t(a) all 
powers of a, higher than n - 1 can be replaced by linear combinations of 1, 
a ,. .., a n ~ 1 over F, whence t(a) e T. We have shown that every nonzero 
element of T has its inverse in T ; consequently, T is a field. However, 
T c F(a), yet F and a are both contained in T, which results in T = F(a). 
We have identified F(a) as the set of all expressions /3 0 + ^a + • • ■ + 

Now T is spanned over F by the elements 1, a,. .., a" 1 in consequence 
of which [ T:F ] < n. However, the elements 1, a, a 2 ,. .., a" 1 are 
linearly independent over F, for any relation of the form y 0 + y x a + 

+ y„_with the elements e F, leads to the conclusion that a 
satisfies the polynomial y 0 + y + • • • + 1 over F of degree 

less than n. This contradiction proves the linear independence of 1, a ,.. • 5 
a"~ 1 , and so these elements actually form a basis of T over F, whence, in 
fact, we now know that [T:F] = n. Since T = F(a), the result 
[F(a):F] = n follows. 

DEFINITION The element a e K is said to be algebraic of degree n over 
F if it satisfies a nonzero polynomial over F of degree n but no nonzero 
polynomial of lower degree. 

In the course of proving Theorem 5.1.2 (in each proof we gave), we proved 
a somewhat sharper result than that stated in that theorem, namely, 
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THEOREM 5.1.3 If a e K is algebraic of degree n over F, then [F(a) :F] = n. 

This result adapts itself to many uses. We give now, as an immediate 
consequence thereof, the very interesting 

THEOREM 5.1.4 If a, b in K are algebraic over F then a ± b, ab, and ajb 
(if b ^ 0) are all algebraic over F. In other words, the elements in K which are 
algebraic over F form a subfield of K. 

Proof. Suppose that a is algebraic of degree m over F while b is algebraic 
of degree n over F. By Theorem 5.1.3 the subfield T = F{a) of K is of 
degree m over F. Now b is algebraic of degree n over F, a fortiori it is algebraic 
of degree at most n over T which contains F. Thus the subfield W = T(b) 
of*, again by Theorem 5.1.3, is of degree at most n over T. But \W:F] = 
[i W:T][T:F] by Theorem 5.1.1; therefore, [W:F] < mn and so IT is a 
finite extension of F. However, a and b are both in W, whence all of 
a ± b, ab, and a/b are in W. By Theorem 5.1.2, since \W:F ] is finite, 
these elements must be algebraic over F, thereby proving the theorem. 

Here, too, we have proved somewhat more. Since [W\F] < mn, every 
element in W satisfies a polynomial of degree at most mn over F, whence the 

COROLLARY If a and b in K are algebraic over F of degrees m and n, respectively, 
then a ± b, ab, and a/b {if b ^ 0) are algebraic over F of degree at most mn. 

In the proof of the last theorem we made two extensions of the field F. 
The first we called T; it was merely the field F{a). The second we called W 
and it was T{b). Thus W = (F(a))(b); it is customary to write it T 'as 
F(a, b). Similarly, we could speak about F{b, a)-, it is not too difficult to 
prove that F{a,b) =F{b,a). Continuing this pattern, we can define 
F( a i, a 2 ,..., a n ) for elements a x ,...,a n in K. 

j^^^TION The extension * of * is called an algebraic extension of F 
if every element in K is algebraic over F. 

prove one more result along the lines of the theorems we have proved 

so far. 

THEOREM 5.1.5 If L is an algebraic extension of K and if K is an algebraic 
extension of F, then L is an algebraic extension of F. 

Proof. Let u be any arbitrary element ofZ; our objective is to show that 
« satisfies some nontrivial polynomial with coefficients in F. What infor¬ 
mation do we have at present? We certainly do know that u satisfies some 
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polynomial x" + o x x n ~ 1 + * ■ ■ + o n , where are in K. But K 

is algebraic over F; therefore, by several uses of Theorem 5.1.3, M = 
jF(( 7 i, . ,( 7 b ) is a finite extension of F. Since u satisfies the polynomial 
x n + o^x n ~ 1 + ■ ■ ■ + <7„ whose coefficients are in M, u is algebraic over 
M Invoking Theorem 5.1.2 yields that M(u) is a finite extension of M. 
However, by Theorem 5.1.1, [M{u) :F] = [Af(«) whence 

M (w) is a finite extension of F. But this implies that u is algebraic over F , 
completing proof of the theorem. 

A quick description of Theorem 5.1.5: algebraic over algebraic is algebraic. 

The preceding results are of special interest in the particular case in 
which F is the field of rational numbers and K the field of complex numbers. 

DEFINITION A complex number is said to be an algebraic number if it is 
algebraic over the field of rational numbers. 

A complex number which is not algebraic is called transcendental. At the 
present stage we have no reason to suppose that there are any transcendental 
numbers. In the next section we shall prove that the familiar real number 
e is transcendental. This will, of course, establish the existence of trans¬ 
cendental numbers. In actual fact, they exist in great abundance; in a 
very well-defined way there are more of them than there are algebraic 
numbers. 

Theorem 5.1.4 applied to algebraic numbers proves the interesting fact 
that the algebraic numbers form a field ; that is, the sum, products, and quotients 
of algebraic numbers are again algebraic numbers. 

Theorem 5.1.5 when used in conjunction with the so-called “fundamental 
theorem of algebra,” has the implication that the roots of a polynomial 
whose coefficients are algebraic numbers are themselves algebraic numbers. 


Problems 


1. Prove that the mapping ij/:F[x] —> F(a ) defined by h(x)il/ h(a) 
is a homomorphism. 


2 . 


3 . 


Let F be a field and let F|>] be the ring of polynomials in x over F. 
Let g(x), of degree n, be in F[x] and let V = (g(#)) be the ideal 
generated by g(x) in F[x]. Prove that F\x\jV is an n-dimensional 
vector space over F. 

(a) If V is a finite-dimensional vector space over the field K, and if 
F is a subfield of K such that [K:F] is finite, show that V is a 
finite-dimensional vector space over F and that moreover 

dim F (F) = (dim K (F))([^:F]). 

(b) Show that Theorem 5.1.1 is a special case of the result of part (aj- 
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4. (a) Let R be the field_of real numbers and Q the field of rational 

numbers. In R, \J 2 and yj 3 are both algebraic over Q. Exhibit 
a polynomial of degree 4 over Q satisfied by yj2 + yj 3. 

(b) What is the degree of yj 2 + y /3 over Q ? Prove your answer. 

(c) What is the degree of y/2 \/3 over Q ? 

5. With the same notation as in Problem 4, show that y/2 + VH is 
algebraic over Q of degree 6. 

*6. (a) Find anelement u e R such that Q(y/ 2, -^5) = Q(«). 

(b) In Q(y/ 2, >/5) characterize all the elements w such that O(w) ^ 
Q(V2^5). 

7. (a) Prove that F(a, b) = F(b, a). 

(b) If (q, i 2} . .., i n ) is any permutation of (1, 2,.. ., n), prove that 

F'( a u •••,«„) = F(a h , a h ,..., aj. 

8. If a, b e K are algebraic over F of degrees m and n, respectively, 
and if m and n are relatively prime, prove that F(a, b) is of degree mn 
over F. 

9. Suppose that /< is a field having a finite number of elements, q. 

(a) Prove that there is a prime number p such that a + a-\ -+ a = 0 

for all a e F. s --" 

(b) Prove that q = p n for some integer n. 

(c) If a e F, prove that a q = a. 

(d) If b e K is algebraic over F, prove b qm = b for some m > 0. 

An algebraic number a is said to be an algebraic integer if it satisfies an 
equation of the form a m + oqa" 1 1 + • • • + a m = 0, where ct 1 , . .., a m are 
integers. 

10. If a is any algebraic number, prove that there is a positive integer n 
such that na is an algebraic integer. 

11. If the rational number r is also an algebraic integer, prove that r 
must be an ordinary integer. 

12fT If a is an algebraic integer and m is an ordinary integer, prove 

(a) a + m is an algebraic integer. 

(b) ma is an algebraic integer. 

13. If a is an algebraic integer satisfying a 3 + a + 1 = 0 and is an 

algebraic integer satisfying (} 2 + p - 3 = 0, prove that both 
a + P and a P are algebraic integers. 

14. (a) Prove that the sum of two algebraic integers is an algebraic 

integer. 
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(b) Prove that the product of two algebraic integers is an algebraic 
integer. 

15. (a) Prove that sin 1° is an algebraic number. 

(b) From part (a) prove that sin m° is an algebraic number for any 
integer m. 

5.2 The Transcendence of e 

In defining algebraic and transcendental numbers we pointed out that it 
could be shown that transcendental numbers exist. One way of achieving 
this would be the demonstration that some specific number is transcendental. 

In 1851 Liouville gave a criterion that a complex number be algebraic; 
using this, he was able to write down a large collection of transcendental 
numbers. For instance, it follows from his work that the number 
.101001000000100 ... 10 ... is transcendental; here the number of zeros 
between successive ones goes as 1!, 2!,. ,«!,.... 

This certainly settled the question of existence. However, the question 
whether some given, familiar numbers were transcendental still persisted. 
The first success in this direction was by Hermite, who in 1873 gave a proof 
that e is transcendental. His proof was greatly simplified by Hilbert. The 
proof that we shall give here is a variation, due to Hurwitz, of Hilbert s 
proof. 

The number n offered greater difficulties. These were finally overcome 
by Lindemann, who in 1882 produced a proof that it is transcendental. 
One immediate consequence of this is the fact that it is impossible, by 
straightedge and compass, to square the circle, for such a construction 
would lead to an algebraic number 6 such that 9 2 — n. But if 9 is algebraic 
then so is 9 2 , in virtue of which n would be algebraic, in contradiction to 
Lindemann’s result. 

In 1934, working independently, Gelfond and Schneider proved that if 
a and b are algebraic numbers and if b is irrational, then a b is transcendental. 
This answered in the affirmative the question raised by Hilbert whether 
2^2 was transcendental. 

For those interested in pursuing the subject of transcendental numbers 
further, we would strongly recommend the charming books by C. L. Siegel, 
entitled Transcendental Numbers, and by I. Niven, Irrational Numbers. 

To prove that e is irrational is easy; to prove that n is irrational is much 
more difficult. For a very clever and neat proof of the latter, see the paper 
by Niven entitled “A simple proof that n is irrational,” Bulletin of the American 
Mathematical Society, Vol. 53 (1947), page 509. 

Now to the transcendence of e. Aside from its intrinsic interest, its proof 
offers us a change of pace. Up to this point all our arguments have been of 
an algebraic nature; now, for a short while, we return to the more familiar 
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grounds of the calculus, lhe proof itself will use only elementary calculus; 
the deepest result needed, therefrom, will be the mean value theorem. 

THEOREM 5.2.1 The number e is transcendental. 

Proof. In the proof we shall use the standard notation f (i) (x) to denote 
the ith derivative of/ (x) with respect to x. 

Suppose that / (x) is a polynomial of degree r with real coefficients. 
Let F(x) = /(*) + / (1) (*) + /< 2 >(*) +■■.+ /«(*). We compute 
( djdx){e F(x)); using the fact thaty (r+ 1) (at) = 0 (since/ - (*) is of degree r) 
and the basic property of «, namely that (djdx)e x = «*, we obtain 
( t i!dx){e- x F(x)) = -e~ x f(x). 

' The mean value theorem asserts that ifg(*) is a continuously differentiable, 
single-valued function on the closed interval \x^, x 2 ] then 

~i) _ = g (1) ( x 1 + @( x 2 - #i)), where 0 < 6 < 1. 

X2 

We apply this to our function e~*F(x), which certainly satisfies all the 
required conditions for the mean value theorem on the closed interval 
l x n * 2 ] where x l = 0 and x 2 — k, where k is any positive integer. We then 
obtain that e~ l F{k) - F( 0) = (6„k)k, where 8 k depends on k and 

is some real number between 0 and 1. Multiplying this relation through by 
r* yields F(k) - F(0)e k = We write this out explicitly: 

F(l) - «F(0) = = £i> 

F( 2) - e 2 F( 0) = —2e 2(l ~° 2) f (20 2 ) = s 2 , __(!) 

F(n) - e"F(0) = -«-<> 

Suppose now that e is an algebraic number; then it satisfies some relation 
of the form 

c „e" + c n _ 1 e n ~ 1 + • • • + c x e + c 0 = 0, (2) 

whe^p c 0 , c u . . ., c n are integers and where c 0 > 0. 

In the relations (1) let us multiply the first equation by c u the second by 
c 2 > and so on; adding these up we get ^^(1) + c 2 F( 2) + • • • + c n F(n ) — 
F (0)(^i« + c 2 e 2 + • • • + c n e n ) = c 1 s 1 + c 2 s 2 + • • • + c n s n . 

In view of relation (2), c t e + r 2 ^ 2 + ••• + = -r 0 , whence the 

above equation simplifies to 

c o F (ty + + • * • + c n F(n ) = + • • • + c n s n . (3) 

All this discussion has held for the F(x) constructed from an arbitrary 



218 


Fields Ch. 5 


polynomial f (x ). We now see what all this implies for a very specific 
polynomial, one first used by Hermite, namely, 

fix) = ---* p_1 (l - x) p {2 - x) p ---(n - x) p . 

JK (P-1)! 

Here p can be any prime number chosen so that p > n and p > c 0 . For 
this polynomial we shall take a very close look at F(0), F(l), . . ., F(n) 
and we shall carry out an estimate on the size of 8 ^, 82 , • • •, £«• 

When expanded, f ( x ) is a polynomial of the form 

(p-1)! (P-1)! (P-1)! 


where a 0 , a 1} . . ., are integers. 

When i > p we claim that / ( 0 (x) is a polynomial, with coefficients 
which are integers all of which are multiples of p. (Prove! See Problem 2.) 
Thus for any integer j, f (i \j), for i > p, is an integer and is a multiple of p. 
Now, from its very definition,/ (*) has a root of multiplicity p at x = 1,2, 
Thus forj = 1, 2,. .., n,f (j) = 0,/ (1) ( j) = 0,. . ., / (p ~ 1 } (j) = 0. 
However, F(j) = f{j) + / ( 1 ) 0) +*’*+/ (p 1 ) (i) + / (p) 0) +''' + 
/W(i); by the discussion above, for j = 1 , 2 ,..., n, F'(j') w ^ integer and 

is a multiple of p. 

What about F(0)? Since / (x) has a root of multiplicity p - 1 at x = 0, 
/(0) =/ (1) (0) = ••• =/ (p_2) ( 0) = 0. For i > p, / (0 ( 0) is an integer 
which is a multiple of p. But f^ p 1 ^(0) = (w!) p and since p > n and is a 
prime number, p f (n\) p so that / (p_1) (0) is an integer not divisible by p. 

Since F(0) =/(0) +/ (1) (0) + +/ (p “ 2) (0) +/ (p " 1) (0) + / (p) (0) + 

... + / (r) (0), we conclude that F(0) is an integer not divisible by p. Because 
c 0 > 0 and p > c 0 and because p f F( 0) whereas p | F(l), p \ F(\ 2), • • •, 
p | F(n), we can assert that c o F(0) + + ’ ‘' + c n^{. n ) * s an i nte S er 

and is not divisible by p. 

However, by (3), c o F(0) + r 1 F(l) + • • • + c n F{n) — c y E x + ’' ' + c n s n- 
What can we say about sf Let us recall that 


e ; = 


■ « i(1 ' fl|) (l - idj) p •' • (n - WiYiidjy-H 


(P 


D! 


where 0 < 0) < 1. Thus 


l<*il 


< 


n p (n\) p 

(P ~ 1)!’ 


e"n p (n\) p 

(P - 1)! 


As p -» GO 
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(Prove!) whence we can find a prime number larger than both c 0 and n and 
large enough to force \c 1 s 1 + • • • + c n sj <1. But + • • • + c n s n = 
c o F(0) + • • • + c n F(n), so must be an integer ; since it is smaller than 1 in 
size our only possible conclusion is that tjSj + • • • + c n s„ = 0. Conse¬ 
quently, c 0 F( 0) + ■ • • + c n F(n) = 0; this however is sheer nonsense, since 
we know that p )( (c 0 F( 0) + • • • + c n F{n )), whereas p | 0. This contradic¬ 
tion, stemming from the assumption that e is algebraic, proves that e must 
be transcendental. 


Problems 


1. Using the infinite series for e, 


111 1 

e — 1 + — + —- + — + • • • + — + 

1! 2! 3! m ! 


prove that e is irrational. 

2. Ifg(tf) is a polynomial with integer coefficients, prove that if/> is a prime 
number then for i > p , 

( g{*) \ 

dx l \{P ~ 1)7 

is a polynomial with integer coefficients each of which is divisible by p. 

3. If a is any real number, prove that -> 0 as m -> oo. 

4. If m > 0 and n are integers, prove that e m/n is transcendental. 


5.3 Roots of Polynomials 

In Section 5.1 we discussed elements in a given extension K of F which were 
algebraic over F, that is, elements which satisfied polynomials in F|V|. 
We now turn the problem around; given a polynomial p(x) in F[*] we 
wish to find a field K which is an extension of F in which p(x) has a root. 
No longer is the field K available to us; in fact it is our prime objective to 
construct it. Once it is constructed, we shall examine it more closely and 
see what consequences we can derive. 

DEFINITION If p(x) ef[*], then an element a lying in some extension 
field of F is called a root of p(x ) if p{a) = 0. 

We begin with the familiar result known as the Remainder Theorem. 

LEMMA 5.3.1 If p{x) e F[>] and if K is an extension of F, then for any ele¬ 
ment b e K,p(x ) = (x — b)q(x) + p{b) where q(x) e and where deg q(x) = 
deg p(x) - 1. 


220 


Fields Ch. 5 


Proof. Since F cz K, -F[Y| is contained in K[x \, whence we can con¬ 
sider p(x) to be lying in K\x\. By the division algorithm for polynomials 
in K\x], p(x) = {x — b)q{x) + r, where q(x) € K\x\ and where r = 0 
or deg r < deg (x — b) = 1. Thus either r = 0 or deg r = 0; in either 
case r must be an element of K. But exactly what element of K is it? 
Since p(x) = (x — b)q{x) + r, p(b) = (b — b)q{b ) + r = r. Therefore, 
p{x) = (x — b)q(x) + p(b). That the degree of q(x) is one less than that of 
p(jx) is easy to verify and is left to the reader. 

COROLLARY If a e K is a root of p{x) e F[*], where F <= K, then in K\x] } 
(x - a) | p(x). 

Proof. From Lemma 5.3.1, in K[x\, p{x ) = {x — a)q{x) + p{a) = 
(x — a)q{x) since p(a ) = 0. Thus (x — a) \ p{x) in K\x~\. 

DEFINITION The element aeK is a root of p(x) effj:] of multiplicity 
m if {x — a) m \p(x), whereas (x — a) m+ 1 f p{x). 

A reasonable question to ask is, How many roots can a polynomial have 
in a given field? Before answering we must decide how to count a root of 
multiplicity m. We shall always count it as m roots. Even with this convention 
we can prove 

LEMMA 5.3.2 A polynomial of degree n over a field can have at most n roots in 
any extension field. 

Proof. We proceed by induction on n, the degree of the polynomial p{x). 
If p{x) is of degree 1, then it must be of the form ax + where a, J 8 are 
in a field F and where a ^ 0. Any a such that p(a ) = 0 must then imply 
that aa + /? = 0, from which we conclude that a = (—/5/a) - That is, 
p(x) has the unique root — /5/a, whence the conclusion of the lemma 
certainly holds in this case. 

Assuming the result to be true in any field for all polynomials of degree 
less than n, let us suppose that p(x) is of degree n over F. Let K be any 
extension of F. If p(x) has no roots in K, then we are certainly done, for the 
number of roots in K, namely zero, is definitely at most n. So, suppose that 
p{x) has at least one root aeK and that a is a root of multiplicity m. Since 
(x — a) m \p{x), m < n follows. Now p{x) = (x — a) m q(x), where q(x) € K\x\ 
is of degree n — m. From the fact that {x — a) m+1 If p(x), we get that 
(x — a) If q(x), whence, by the corollary to Lemma 5.3.1, a is not a root 
of q(x). If b 7 ^ a is a root, in K, of p(x), then 0 = p{b) = (b — a) m q(b); 
however, since b — a ^ 0 and since we are in a field, we conclude that 
q(b) = 0. That is, any root of p(x), in K, other than a, must be a root of 
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q(x). Since q(x) is of degree n - m < n, by our induction hypothesis, q(x) 
has at most n-m roots in K, which, together with the other root a 
counted m times, tells us that p{x) has at most m + (n - m) = n roots in 
, K. This completes the induction and proves the lemma. 

l One should point out that commutativity is essential in Lemma 5.3.2. 
\ If we consider the ring of real quaternions, which falls short of being a field 
only in that k fails to be commutative, then the polynomial x 2 + 1 has at 
J least 3 roots, i,j,k (in fact, it has an infinite number of roots). In a some¬ 
what different direction we need, even when the ring is commutative, that 
it be an integral domain, for if ab = 0 with a =£ 0 and b ^ 0 in the com¬ 
mutative ring R, then the polynomial ax of degree 1 over R has at least 
two distinct roots x = 0 and x = b in R. 

The previous two lemmas, while interesting, are of subsidiary interest 
We now set ourselves to our prime task, that of providing ourselves with 
suitable extensions of .Fin which a given polynomial has roots. Once this is 
done, we shall be able to analyze such extensions to a reasonable enough 

egree of accuracy to get results. The most important step in the construction 
is accomplished for us in the next theorem. The argument used will be very 
reminiscent of some used in Section 5.1. 

THEOREM 5.3.1 If p{x) is a polynomial in F[x ] of degree n > 1 and is 
irreducible over F, then there is an extension E of F, such that [F:F] = n, in which 
p{x) has a root. 

Proof. Let F[x] be the ring of polynomials in * over F and let V = 
(/>(*)) be the ideal of F[>] generated by p{x). By Lemma 3.9.6, V is a 
maximal ideal of F[#], whence by Theorem 3.5.1, E = F[>]/F is a field. 
This F will be shown to satisfy the conclusions of the theorem. 

First we want to show that F is an extension of F; however, in fact, it is 
not! But let^F be the image of F in F; that is, F = (a + V | a e F}. We 
assert that F is a field isomorphic to F; in fact, if p is the mapping from 
F[x] into F[x]/V= E defined by f {x)P = f (x) + V, then the restriction 
of ^ to F induces an isomorphism of F onto F. (Prove!) Using this iso¬ 
morphism, we identify F and F; in this way we can consider E to be an extension 
of Fs 

We claim that F is a finite extension of F of degree n = deg p(x), for the 
elements^ + V, x~ + V, (x + V) 2 = x 2 + V,... ,(x + V) 1 = x l + V,. . ., 

{x + V) n 1 = x n 1 + V form a basis of F over F. (Prove!) For con¬ 
venience of notation let us denote the element xp = x + V in the field 
f/s «■ Given f(x) e F[x], what is f(x)p? We claim that it is merely 
for since p is a homomorphism, if /(*)=&>+&* + ••■ + B k x k , 

. en ~ f? 0 ^ + ’■* + (P k ^)(xP) k , and using the 

identification indicated above of pp with p, we see that f {x)p =/(«). 
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In particular, since p{x) e V, p(x)ij/ = 0; however, p{x)\jj = p(a). Thus 
the element a = xij/ in E is a root ofp(x). The field E has been shown to satisfy 
all the properties required in the conclusion of Theorem 5.3.1, and so this 
theorem is now proved. 

An immediate consequence of this theorem is the 

COROLLARY If fix) eF\x\, then there is a finite extension E of F in which 
f (x) has a root. Moreover , [F:F] < deg fix). 

Proof. Let p{x) be an irreducible factor of f (x); any root of p{x) is a 
root of f (x ). By the theorem there is an extension E of F with [. E :F] = 
deg/{x) < d e g fix) in which p(x), and so,/ (x) has a root. 

Although it is, in actuality, a corollary to the above corollary, the next 
theorem is of such great importance that we single it out as a theorem. 

THEOREM 5.3.2 Let f(x) eF\x] be of degree n > 1. Then there is an ex¬ 
tension E of F of degree at most n\ in which f (x) has n roots (and so, a full com¬ 
plement of roots). 

Proof. In the statement of the theorem, a root of multiplicity m is, of 
course, counted as m roots. 

By the above corollary there is an extension E 0 of F with [F 0 :F] < n in 
which/(x) has a root a. Thus in F 0 [x], f (x) factors as/(x) = (x — a)q{x), 
where q{x) is of degree n — 1. Using induction (or continuing the above 
process), there is an extension E of E 0 of degree at most (n — 1)! in which 
q(x) has n — 1 roots. Since any root of/(x) is either a or a root of qix), we 
obtain in E all n roots of f (x). Now, [F:F] = [F:F 0 ][F 0 :F] < in — l)!ra = n\ 
All the pieces of the theorem are now established. 

Theorem 5.3.2 asserts the existence of a finite extension E in which the 
given polynomial /(x), of degree n, over F has n roots. If /(x) = a 0 x n + 
a v x n ~ 1 + • • ■ + a n , a 0 ^ 0 and if the n roots in E are a l5 . . . , a„, making 
use of the corollary to Lemma 5.3.1 ,/(x) can be factored over E as/ (x) = 
a 0 (x — a t )(x — a 2 ) * ■ • (x — a„). Thus f(x) splits up completely over E 
as a product of linear (first degree) factors. Since a finite extension of F 
exists with this property, a finite extension of F of minimal degree exists which 
also enjoys this property of decomposing/(x) as a product of linear factors. 
For such a minimal extension, no proper subfield has the property that 
/ (x) factors over it into the product of linear factors. This prompts the 

DEFINITION If /(x)ef[x], a finite extension E of F is said to be a 
splitting field over F for /(x) if over E (that is, in F[x]), but not over any 
proper subfield of E, f (x) can be factored as a product of linear factors. 
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We reiterate: Theorem 5.3.2 guarantees for us the existence of splitting fields. 
In fact, it says even more, for it assures that given a polynomial of degree 
n over F there is a splitting field of this polynomial which is an extension of 
F of degree at most n\ over F. We shall see later that this upper bound of 
«! is actually taken on; that is, given n, we can find a field F and a poly¬ 
nomial of degree n in F[x ] such that the splitting field of/(*) over F has 
degree n\. 

Equivalent to the definition we gave of a splitting field for/ (x) over F is 
the statement: E is a splitting field of f (x) over F if E is a minimal extension 
of F in which f {x) has n roots , where n = deg f (x). 

An immediate question arises: given two splitting fields £j and E 2 of the 
same polynomial f (x) in F[x], what is their relation to each other? At 
first glance, we have no right to assume that they are at all related. Our 
next objective is to show that they are indeed intimately related; in fact, 
that they are isomorphic by an isomorphism leaving every element of F 
fixed. It is in this direction that we now turn. 

Let F and F' be two fields and let t be an isomorphism of F onto F'. 
For convenience let us denote the image of any a e F under t by a'; that 
is, ctz — cl'. We shall maintain this notation for the next few pages. 

Can we make use of x to set up an isomorphism between jF[x] and F'[£], 
the respective polynomial rings over F and F'? Why not try the obvious? 
For an arbitrary polynomial / (x) = a 0 x n + o^x" -1 + • • • + oc n eF[x\ we 
define t* by f (x)x* = (a 0 xp + a x x" _1 + • • • + a n )x* = a ' 0 t n + aj t n ~ 1 + 
cc„. 

It is an easy and straightforward matter, which we leave to the reader, 
to verify. 

LEMMA 5.3.3 t* defines an isomorphism of F\x\ onto F'[t ] with the property 
that clx* = cl’ for every a e F. 

If f (x) is in F\x\ we shall write/ (x)x* as f'(t). Lemma 5.3.3 immediately 
implies that factorizations of f (x) in F\x\ result in like factorizations of 
f'(t) in F’\t], and vice versa. In particular, / (x) is irreducible in F\x\ 
if and only if f'(t) is irreducible in F'[tf 

jHowever, at the moment, we are not particularly interested in polynomial 
rings, but rather, in extensions of F. Let us recall that in the proof of 
Theorem 5.1.2 we employed quotient rings of polynomial rings to obtain 
suitable extensions of F. In consequence it should be natural for us to study 
the relationship between F\x]j(f(x)) and F'[t^j(f'(t)), where (f(x)) 
denotes the ideal generated by f (x) in F[x ] and (f'(t)) that generated by 
f'(t) in F’\t'\. The next lemma, which is relevant to this question, is actually 
part of a more general, purely ring-theoretic result, but we shall content 
ourselves with it as applied in our very special setting. 
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LEMMA 5.3.4 There is an isomorphism t** of F[x]j(f (x)) onto F'[t]l(f'(t)) 
with the property thatfor every a e F, ax** = a', (x + C/(*))) T ** = * + if'i 0)- 

Proof. Before starting with the proof proper, we should make clear what 
is meant by the last part of the statement of the lemma. As we have already 
done several times, we can consider F as imbedded in F[x]/(f (x)) by 
identifying the element a e F with the coset a + (./(*)) l(f( x ))- 

Similarly, we can consider F' to be contained in i 7 '[£]/(/'(*))• The 
isomorphism t** is then supposed to satisfy [a + (/ (*))]t** = a' + {fit)). 

We seek an isomorphism t** of F[x]/(f(x)) onto F'[t\l(f'(t)). 
What could be simpler or more natural than to try the t** defined by 
[g(*) + (/(*))]t** = g'(t) + (/'(0) for every g(x) e F[;v]? We leave 
it as an exercise to fill in the necessary details that the t** so defined is well 
defined and is an isomorphism of F[x]/(f (*)) onto F'\t]f(f'{t)) with the 
properties needed to fulfill the statement of Lemma 5.3.4. 

For our purpose—that of proving the uniqueness of splitting fields— 
Lemma 5.3.4 provides us with the entering wedge, for we can now prove 

THEOREM 5.3.3 If p{x) is irreducible in F[x] and if v is a root of p(x), then 
F(v ) is isomorphic to F'{w) where w is a root of p'{t); moreover , this isomorphism 
o can so be chosen that 

1 . vo — w. 

2. a o — a' for every a e F. 

Proof. Let v be a root of the irreducible polynomial p(x) lying in some 
extension K of F. Let M = {/ (*) ef[j] \ f (v) = 0}. Trivially M is an 
ideal of F\x\, and M # F[*]. Since p{x) e M and is an irreducible poly¬ 
nomial, we have that M = ( p(x )). As in the proof of Theorem 5.1.2, map 
F[*] into F(v) cz K by the mapping i J/ defined by q(x)\J/ = q(v) for every 
q{x) 6f[4 We saw earlier (in the proof of Theorem 5.1.2) that i j/ maps 
F[x] onto F(v). The kernel of i j/ is precisely M, so must be (J>(x )). By the 
fundamental homomorphism theorem for rings there is an isomorphism p* 
of F[x]l(j>(x)) onto F(v). Note further that ap* = a for every a e F. 
Summing up: p* is an isomorphism of F[x]l(p(x)) onto F(v) leaving 
every element of F fixed and with the property that v = [x + (p(x))]p*- 

Since p (*) is irreducible in F[x ], p'(t) is irreducible in F'[t ] (by Lemma 
5.3.3), and so there is an isomorphism 6* of F'\t\j{p'{t)) onto F'(w) where 
w is a root of p'(t) such that 9* leaves every element of F' fixed and such 
that [t + (J>’(t)]6* = w. 

We now stitch the pieces together to prove Theorem 5.3.3. By'Lemma 
5.3.4 there is an isomorphism t** of F[x\!{p{x)) onto F'[t]l(p'(t)) which 
coincides with t on F and which takes x + (p(x)) onto t + ( p'(t )). Con- 
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sider the mapping a — (p*) (motivated by 


F[x] 

C P (*)) 


F'[t] e* 
{P'{t)) 


F'(w )) 


of F(v) onto F'(w). It is an isomorphism of F(v) onto F'(w ) since all the 
mapping p*, t**, and 0* are isomorphisms and onto. Moreover, since 
v = [x + (p(x))]p*, va = (z#*) -1 )t **0* = ([* + (/>(*)]t**)0 * = 
[t + {p'(t))]0* = w. Also, for aeF, ao = (oc(p*)~ 1 )x**0* = ( az**)0* = 
tt'0* — cl . We have shown that o is an isomorphism satisfying all the 
requirements of the isomorphism in the statement of the theorem. Thus 
Theorem 5.3.3 has been proved. 


A special case, but itself of interest, is the 


COROLLARY If jb(x) e F[x] is irreducible and if a, b are two roots of p(x), 
then F(a) is isomorphic to F(b) by an isomorphism which takes a onto b and which 
leaves every element of F fixed. 


We now come to the theorem which is, as we indicated earlier, the 
foundation stone on which the whole Galois theory rests. For us it is the 
focal point of this whole section. 


THEOREM 5.3.4 Any splitting fields E and E' of the polynomials f(x) e F[x\ 
and f{t) eT[/], respectively, are isomorphic by an isomorphism p with the prop¬ 
erty that cl< f) = cl' for every clgF. (In particular, any two splitting fields of the 
same polynomial over a given field F are isomorphic by an isomorphism leaving every 
element of Ffixed.) 

Proof. We should like to use an argument by induction; in order to do 
so, we need an integer-valued indicator of size which we can decrease by 
some technique or other. We shall use as our indicator the degree of some 
splitting field over the initial field. It may seem artificial (in fact, it may 
even be artificial), but we use it because, as we shall soon see, Theorem 5.3.3 
provides us with the mechanism for decreasing it. 

If \E:F~\ = 1 , then E = F, whence f{x) splits into a product of linear 
factors over F itself. By Lemma 5.3.3 f'{t) splits over F' into a product of 
linear factors, hence E' = F'. But then (j) = t provides us with an iso¬ 
morphism of E onto E' coinciding with t on F. 

Assume the result to be true for any field F 0 and any polynomial f ( x) e 
*oM provided the degree of some splitting field E 0 of f (x) has degree less 
than n over F 0 , that is, [. E 0 :F 0 ] < n. 

Suppose that [is :F] = n > 1, where E is a splitting field of f(x) over F. 
Since n > 1, f (x) has an irreducible factor p(x) of degree r > 1. Let 
p'{t) be the corresponding irreducible factor of f'{t). Since E splits/(*), a 
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full complement of roots of/ (x), and so, a priori , of roots of p{x), are in E. 
Thus there is a v e E such that p(v) — 0; by Theorem 5.1.3, [.F(y) :.F] = r. 
Similarly, there is a w e E' such that p'(w) = 0. By Theorem 5.3.4 there 
is an isomorphism a of F{v) onto F'(w) with the property that a a — a' 
for every a e F. 

Since [F(y):F] = r > 1, 


[E:F(v)] 


[E:F] 

[/?(»):/?] 


n 

- < n. 
r 


We claim that E is a splitting field for f (x) considered as a polynomial over 
F 0 = F(v), for no subfield of E, containing F 0 and hence F, can split f (x), 
since E is assumed to be a splitting field of f (x) over F. Similarly E' is a 
splitting field for f'(t) over Fq = F'(w). By our induction hypothesis there 
is an isomorphism (j) of E onto E' such that acj) = aa for all a e F 0 . But 
for every a e F, aa = a' hence for every a e F a F 0 , cuj) = aa = a'. 
This completes the induction and proves the theorem. 

To see the truth of the “(in particular . . .)” part, let F — F' and let t 
be the identity map ax = a for every a e F. Suppose that E x and E 2 are 
two splitting fields of f (x) Considering E x = E zd F and E 2 = 

E' 3 F' — F, and applying the theorem just proved, yields that £j and 
E 2 are isomorphic by an isomorphism leaving every element of F fixed. 

In view of the fact that any two splitting fields of the same polynomial 
over F are isomorphic and by an isomorphism leaving every element of F 
fixed, we are justified in speaking about the splitting field, rather than a 
splitting field, for it is essentially unique. 


Examples 

1. Let F be any field and let p(x) = x 2 + ax + /?, a, j 8 e F, be in F[x]. 
If K is any extension of F in which p{x) has a root, a, then the element 
b = —a — a also in K is also a root of p(x). If b = a it is easy to check 
that p{x) must then be p(x) = {x — a) 2 , and so both roots of p(x) are in 
K. If b # a then again both roots of p(x) are in K. Consequently, p{x) 
can be split by an extension of degree 2 of F. We could also get this result 
directly by invoking Theorem 5.3.2. 

2. Let F be the field of rational numbers and let f (x) = x 3 — 2. In the 
field of complex numbers the three roots of f (x) are Xj2, co Vi, co 2 VI 
where c o = (—1 + \/3 ij /2 and where 2 is a real cube root of 2. Now 
F(V 2) cannot split x 3 — 2, for, as a subfield of the real field, it cannot 
contain the complex, but not real, number a>V 2. Without explicitly 
determining it, what can we say about E, the splitting field of x 3 — 2 over 


Sec. 5.3 Roots of Polynomials 


227 


F? By Theorem 5.3.2, [E:F] < 3! = 6; by the above remark, since 

* 3 “ 2 ^ irreducible over F and since (7^2) :F] = 3, by the corollary to 

Theorem 5.1.1, 3 = [7^2) :F] | [£:F]. Finally, [£:F] > [F(^):F] = 3. 

The only way out is [F'rF] = 6. We could, of course, get this result by 

making two extensions F j = F (\/2) and E = F^cS) and showing that co 

satisfies an irreducible quadratic equation over F v 

3. Let F be the field of rational numbers and let 
/(*) = x 4 + x 2 + 1 e F[*]. 

We claim that E = F(co), where co = (-1 + ^3 *)/2, is a splitting field 

off (x). Thus [£:F] = 2, far short of the maximum possible 4! = 24. 

Problems 

1. In the proof of Lemma 5.3.1, prove that the degree of <^(x) is one less 
than that of p(x). 

2. In the proof of Theorem 5.3.1, prove in all detail that the elements 
1 + V, x + V ,..., x n 1 + V form a basis of E over F. 

3. Prove Lemma 5.3.3 in all detail. 

4. Show that t** in Lemma 5.3.4 is well defined and is an isomorphism 
ofF[x]l(f(x)) onto F[t]l(f'(t)). 

5. In Example 3 at the end of this section prove that F(a)) is the splitting 
field of x 4 + x 2 + 1. 

6. Let F be the field of rational numbers. Determine the degrees of the 
splitting fields of the following polynomials over F. 

(a) x 4 + 1. (b) x 6 + 1. 

(c) x 4 - 2. (d) x 5 - 1. 

(e) x 6 + x 3 + 1. 

7. If p is a prime number, prove that the splitting field over F, the field 
of rational numbers, of the polynomial x p — 1 is of degree p — 1. 

**8. If n > 1, prove that the splitting field of x n - 1 over the field of 
rational numbers is of degree 0>(n) where O is the Euler 0>-function. 

7 (This is a well-known theorem. I know of no easy solution, so don’t 
be disappointed if you fail to get it. If you get an easy proof, I would 
like to see it. This problem occurs in an equivalent form as Problem 15, 
Section 5.6.) 

*9. If F is the field of rational numbers, find necessary and sufficient 
conditions on a and b so that the splitting field of x 3 + ax + b has 
degree exactly 3 over F. 

10. Let p be a prime number and let F = J p , the field of integers mod p. 
(a) Prove that there is an irreducible polynomial of degree 2 over F. 
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(b) Use this polynomial to construct a field with p 2 elements. 

*(c) Prove that any two irreducible polynomials of degree 2 over F 
lead to isomorphic fields with p 2 elements. 

11. If E is an extension of F and if f (x) e F[x] and if (j) is an automor¬ 
phism of E leaving every element of F fixed, prove that 0 must take a 
root of f ( x) lying in E into a root of f ( x) in E. 

12. Prove that F(l/ 2), where F is the field of rational numbers, has no 
automorphisms other than the identity automorphism. 

13. Using the result of Problem 11, prove that if the complex number 
a is a root of the polynomial p(x) having real coefficients then a, the 
complex conjugate of a, is also a root of p(x). 

14. Using the result of Problem 11, prove that if m is an integer which is 
not a perfect square and if a + f$\/m (a, rational) is the root of a 
polynomial p{x) having rational coefficients , then a — fiyjm is also a 
root of p(x). 

*15. If F is the field of real numbers, prove that if (j) is an automorphism 
of F, then (j) leaves every element of F fixed. 

16 (a) Find all real quaternions t = a 0 + afi + a 2 j + affi satisfying 
t 2 = -1 

*(b) For a t as in part (a) prove we can find a real quaternion s such 
that sts~ 1 = i. 

5.4 Construction with Straightedge and Compass 

We pause in our general development to examine some implications of the 
results obtained so far in some familiar, geometric situations. 

A real number a is said to be a constructible number if by the use of straight¬ 
edge and compass alone we can construct a line segment of length a. We 
assume that we are given some fundamental unit length. Recall that from 
high-school geometry we can construct with a straightedge and compass a 
line perpendicular to and a line parallel to a given line through a given 
point. From this it is an easy exercise (see Problem 1) to prove that if 
a and ft are constructible numbers then so are a + /?, a/?, and when (3 0, 

a/ft. Therefore, the set of constructible numbers form a subfield, W, of the 
field of real numbers. 

In particular, since 1 e W, W must contain F 0 , the field of rational 
numbers. We wish to study the relation of W to the rational field. 

Since we shall have many occasions to use the phrase “construct by 
straightedge and compass 55 (and variants thereof) the words construct, con¬ 
structible, construction, will always mean by straightedge and compass. 

If w £ W, we can reach w from the rational field by a finite number of 
constructions. 
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Let F be any subfield of the field of real numbers. Consider all the points 
(x,y) in the real Euclidean plane both of whose coordinates x and y are in 
F\ we call the set of these points the plane of F. Any straight line joining two 
y points in the plane of F has an equation of the form ax + by + c = 0 
where a , b , c are all in F (see Problem 2). Moreover, any circle having as 
y center a point in the plane of F and having as radius an element of F has 
% an equation of the form x 2 + y 2 + ax + by + c = 0, where all of a, b, c 
% are in F (see Problem 3). We call such lines and circles lines and circles 
in F. 

Given two lines in F which intersect in the real plane, then their inter¬ 
section point is a point in the plane of F (see Problem 4). On the other hand, 
the intersection of a line in F and a circle in F need not yield a point in the 
plane of F. But, using the fact that the equation of a line in F is of the form 
ax + by + c = 0 and that of a circle in F is of the form x 2 + y 2 + dx + 
Q> +f= where a, b, c, d, e,f are all in F, we can show that when a line 
and circle of F intersect in the real plane, they intersect either in a point in 
the plane of I 1 or in the plane of F(\J y) for some positive y in F (see Problem 
5). Finally, the intersection of two circles in F can be realized as that of 
a line in F and a circle in F, for if these two circles are x 2 + y 2 + a x x + 
biy + c x — 0 and x 2 + y 2 + a 2 x + b 2 y + c 2 — 0, then their intersection 
is the intersection of either of these with the line (« 1 - a 2 )x + (b 1 - b 2 )y + 

i c i ~ c 2 ) = 0, so also yields a point either in the plane of F or of F(yfy) 
for some positive y in F. 

Thus lines and circles of F lead us to points either in F or in quadratic 
extensions of F. If we now are in F{\/ y x ) for some quadratic extension of 
F, then lines and circles in F(jy x ) intersect in points in the plan£ of 
Vy 2 ) where y 2 is a positive number in F(\Jy J. A point is con¬ 
structive from F if we can find real numbers X u . . ., X n , such that X x 2 e F, 
X 2 ' g F(X x ), Xf 2 G F(X U X 2 ), . . . , X n 2 g F(X x , . . . , X„_ l ), such that the 
point is in the plane of F(X x , . .. , X n ). Conversely, if y e F is such that 
y/y is real then we can realize y as an intersection of lines and circles in F 
(see Problem 6). Thus a point is constructible from F if and only if we 
y can find a finite number of real numbers X x ,. . . , X n , such that 

1- [£( X t ):F] = 1 or 2; 

2. [F{X x ,.. ., X v ) :F(X U ..., X f _ x )] = 1 or 2 for i — 1,2 

an d such that our point lies in the plane of F(X x ,. . . , X„). 
x have defined a real number a to be constructible if by use of straight- 

edge and compass we can construct a line segment of length a. But this 
h’anslates, in terms of the discussion above, into: a is constructible if starting 
H from the plane of the rational numbers, F 0 , we can imbed a in a field 
K °htained from F 0 by a finite number of quadratic extensions. This is 
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THEOREM 5.4.1 The real number a is constructible if and only if we can find, 
a finite number of real numbers A 1} . . ., A„ such that 

1 . 

2. If e F 0 (X U . .., X t _ x ) for i = 1, 2,..., n, 

such that a e F 0 [X x ,..., A„). 1 

However, we can compute the degree of F 0 (X l} • • • > A„) over f° r by 
Theorem 5.1.1 

[F„(A„ ..., K) :F 0 ] = [F 0 (Ai,..., K) :F„(A„ -. -, - 

x [F 0 (A 1) ... > A,):F 0 (A, > ....A I . I )]--- 
x [F„(A,):F 0 ]. 

Since each term in the product is either 1 or 2, we get that 
[F 0 (A 15 ..., X n ):F 0 ] = 2 r , 

and thus the 

COROLLARY 1 If a is constructible then a lies in some extension of the rationals j 
of degree a power of 2 . 

If a is constructible, by Corollary 1 above, there is a subfield K of the real . 
field such that aeK and such that [iC.F 0 ] = 2 r . However, F 0 (a) <= K, \ 
whence by the corollary to Theorem 5.1.1 [F 0 (a) :^ 0 ] | [^^ol = 2 r ; thereby 
[F 0 (a) is a l so a power of 2. However, if a satisfies an irreducible 
polynomial of degree k over F 0 , we have proved in Theorem 5.1.3 that 
[F 0 (a) :F 0 ] = k. Thus we get the important criterion for nonconstructibility 

| 

COROLLARY 2 If the real number a satisfies an irreducible polynomial over 
the field of rational numbers of degree k, and if k is not a power of 2, then a is not 
constructible. 

This last corollary enables us to settle the ancient problem of trisecting 
an angle by straightedge and compass, for we prove 

THEOREM 5.4.2 It is impossible , by straightedge and compass alone, to trisect 
60°. 

Proof. If we could trisect 60° by straightedge and compass, then the 
length a = cos 20° would be constructible. At this point, let us recall the 
identity cos 36 = 4 cos 3 6 — 3 cos 6. Putting 6 = 20° and remembering 
that cos 60° = j, we obtain 4a 3 — 3a = j, whence 8a 3 — 6a — 1 = 0- 
Thus a is a root of the polynomial 8* 3 — 6* — 1 over the rational field. 
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ftpowever, this polynomial is irreducible over the rational field (Problem 
fe(a)), and since its degree is 3, which certainly is not a power of 2, by 
■Corollary 2 to Theorem 5.4.1, a is not constructible. Thus 60° cannot be 
jlffiscctcd by straightedge and compass. 

Another ancient problem is that of duplicating the cube, that is, of 
instructing a cube whose volume is twice that of a given cube. If the 
iginal cube is the unit cube, this entails constructing a length a such that 
Jj 3 = 2. Since the polynomial * 3 - 2 is irreducible over the rationals 
J||i(Problem 7(b)), by Corollary 2 to Theorem 5.4.1, a is not constructible. 
JpThus 

^THEOREM 5.4.3 By straightedge and compass it is impossible to duplicate the 
■ cube. 

■:, We wish to exhibit yet another geometric figure which cannot be con¬ 
structed by straightedge and compass, namely, the regular septagon. To 
fearry out such a construction would require the constructibility of a = 
jj|2cos (2rc/7). However, we claim that a satisfies x 3 + x 2 — 2x — 1 
•(Problem 8) and that this polynomial is irreducible over the field of rational 
fttiumbers (Problem 7(c)). Thus again using Corollary 2 to Theorem 5.4.1 
Mpre obtain 

•THEOREM 5.4.4 It is impossible to construct a regular septagon by straightedge 
md compass. 

| Problems 

I 1- Prove that if a, are constructible, then so are a ± /?, a($, and a//? 
(when # 0). 

2. Prove that a line in F has an equation of the form ax + by + c = 0 
with a, b, c in F. 

3. Prove that a circle in F has an equation of the form 
x 2 + y 2 + ax + by + c = 0, 

/with a, b, c in F. 

4. Prove that two lines in F, which intersect in the real plane, intersect 
at a point in the plane of F. 

5. Prove that a line in F and a circle in F which intersect in the real 

plane do so at a point either in the plane of F or in the plane of F(yJy) 
where y is a positive number in F. 

6. If y e F is positive, prove that yjy is realizable as an intersection of 
lines and circles in F. 
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7. Prove that the following polynomials are irreducible over the field of 
rational numbers. 

(a) 8* 3 — 6* — 1. 

(b) * 3 - 2. 

(c) x 3 + x 2 — 2x — 1. 

8. Prove that 2 cos (2 tt/ 7) satisfies x 3 + x 2 — 2x — 1. (Hint: Use 
2 cos (2 tt/7) = e 2nin + e~ 2ltin .) 

9. Prove that the regular pentagon is constructible. 

10. Prove that the regular hexagon is constructible. 

11. Prove that the regular 15-gon is constructible. 

12. Prove that it is possible to trisect 72°. 

13. Prove that a regular 9-gon is not constructible. 

*14. Prove a regular 17-gon is constructible. 

5.5 More about Roots 

We return to the general exposition. Let F be any field and, as usual, let 
F [#] be the ring of polynomials in x over F. 

DEFINITION If/(*) = a 0 x" + + *•• + a,*""' + ••• + a„_^ + 

a„ in F[*], then the derivative of f(x), written as f'(x), is the polynomial 
f'(x) = na 0 x "~ 1 + (n — l)^*" -2 + ••• + (n — i)oq* n_, ‘ _1 + ••• + a„_! 
in F[*]. 

To make this definition or to prove the basic formal properties of the 
derivatives, as applied to polynomials, does not require the concept of a 
limit. However, since the field F is arbitrary, we might expect some strange 
things to happen. 

At the end of Section 5.2, we defined what is meant by the characteristic 
of a field. Let us recall it now. A field F is said to be of characteristic 0 if 
ma 0 for a ^ 0 in F and m > 0, an integer. If ma = 0 for some m > 0 
and some a 0 e F, then F is said to be of finite characteristic. In this 
second case, the characteristic of F is defined to be the smallest positive 
integer p such that pa = 0 for all a e F. It turned out that if F is of finite 
characteristic then its characteristic p is a prime number. 

We return to the question of the derivative. Let F be a field of character¬ 
istic p 0. In this case, the derivative of the polynomial x p is px p ~ 1 = 0. 
Thus the usual result from the calculus that a polynomial whose derivative 
is 0 must be a constant no longer need hold true. However, if the charac¬ 
teristic of F is 0 and if f'(x) = 0 for f (x) e f [4 it is indeed true that 
f (x) = a e F (see Problem 1). Even when the characteristic of F is 
p ^ 0, we can still describe the polynomials with zero derivative; if 
f'(x ) = 0, then f (*) is a polynomial in x p (see Problem 2). 
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We now prove the analogs of the formal rules of differentiation that we 
know so well. 

g lEMMA 5.5.1 For any f (x), g(x) e F[x] and any a e F, 

■t. (/(*) + g(x)Y =f'(x) + g’(x). 

||p. (a f{x))' = cf'(x). 
fi ( f(x)g(x)Y =f'{x)g{x) + f{x)g'(x). 

Proof. The proofs of parts 1 and 2 are extremely easy and are left as 
exercises. To prove part 3, note that from parts 1 and 2 it is enough to 
P rovc k in the highly special case /(*) = x l and g(x) = x j where both 
i and j are positive. But then f(x)g(x) = x i+J , whence (f(x)g(x))' = 
(i + j)x i+J 1 ; however, f'(x)g(x) = ix i ~ 1 x J = ix i+J ~ 1 and f(x)g'(x) = 
jx i x J 1 =jx i+J 1 ; consequently, f'(x)g(x) + f(x)g'(x) = (i + j)x i+j ~ i = 

(f{x) g (x)y. 

Recall that in elementary calculus the equivalence is shown between the 
existence of a multiple root of a function and the simultaneous vanishing of 
the function and its derivative at a given point. Even in our setting, where 
F is an arbitrary field, such an interrelation exists. 

LEMMA 5.5.2 The polynomial f (x) e F[x] has a multiple root if and only if 
fix) and f'{x) have a nontrivial (that is, of positive degree) common factor. 

Proof. Before proving the lemma proper, a related remark is in order, 
namely, if/ (x) and g(x) in /?(>] have a nontrivial common factor in K\x\, 
for K an extension of F, then they have a nontrivial common factor in F [*]. 
For, were they relatively prime as elements in F[x\, then we would be 
able to find two polynomials a{x) and b(x) in F[x] such that a(x)f(x) + 
b{x)g(x) = 1. Since this relation also holds for those elements viewed 
as elements of K\x\, in K\x\ they would have to be relatively prime. 

Now to the lemma itself. From the remark just made, we may assume, 
without loss of generality, that the roots of/ (x) all lie in F (otherwise ex¬ 
tend F to K, the splitting field of/(x)). If/(x) has a multiple root a, then 
fix) (# oc) q{x), where m > 1. However, as is easily computed, 
\{x A a) m )' = m(x - a) m ~ 1 whence, by Lemma 5.5.1, f'(x) = 
{x — a ) m q'(x) + m(x — a) m x q{x) = (x — a)r(x), since m > 1. But this 
says that / (x) and / (#) have the common factor x — a, thereby proving 
the lemma in one direction. 

On the other hand, if f ( x) has no multiple root then f ( x) = 

\X oq)(.* a 2 ) (x a„) where the a/s are all distinct (we are 

supposing/ (x) to be monic). But then 

n 

f’( x ) — ^2 ( x ~ a i )••*(* — oq) ••• (x — a„) 
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where the A denotes the term is omitted. We claim no root of / (x) is a 
root of/'(*), for 

/'(«i) = n (°h - a j) ^ °> 

j*i 

since the roots are all distinct. However, if f (x) and/'(x) have a nontrivial 
common factor, they have a common root, namely, any root of this common 
factor. The net result is that f (x) and /'(*) have no nontrivial common 
factor, and so the lemma has been proved in the other direction. 

COROLLARY 1 If fix) e F[x] is irreducible , then 

1. If the characteristic of F is 0,/ (x) has no multiple roots. 

2. If the characteristic of F is p ^ 0, f (x) has a multiple root only if it is of the 
form f (x) = g{ x p ). 

Proof. Since / (x) is irreducible, its only factors in F[x] are 1 and f{x). 
If f(x) has a multiple root, then f (x) and /'(*) have a nontrivial common 
factor by the lemma, hence / (x) | f'(x). However, since the degree of/'(x) 
is less than that of/(x), the only possible way that this can happen is for 
f'(x) to be 0. In characteristic 0 this implies that/(x) is a constant, which 
has no roots; in characteristic p # 0, this forces/(x) = g(x p ). 

We shall return in a moment to discuss the implications of Corollary 1 
more fully. But first, for later use in Chapter 7 in our treatment of finite 
fields, we prove the rather special 

COROLLARY 2 If F is a field of characteristic p # 0, then the polynomial 
x p n _ x e F[x], for n > 1, has distinct roots. 

Proof. The derivative of x pn — x is p n x pn ~ 1 — 1 = — 1, since F is of 
characteristic p. Therefore, x p " — * and its derivative are certainly rela¬ 
tively prime, which, by the lemma, implies that x p " — * has no multiple 
roots. 

Corollary 1 does not rule out the possibility that in characteristic p # 0 
an irreducible polynomial might have multiple roots. To clinch matters, 
we exhibit an example where this actually happens. Let F 0 be a field of 
characteristic 2 and let F = F 0 (x) be the field of rational functions in x 
over F 0 . We claim that the polynomial t 2 — x in F[£] is irreducible over F 
and that its roots are equal. To prove irreducibility we must show that 
there is no rational function in F 0 (x) whose square is x; this is the content 
of Problem 4. To see that t 2 — x has a multiple root, notice that its deriv¬ 
ative (the derivative is with respect to t; for x, being in F, is considered as a 
constant) is 2t = 0. Of course, the analogous example works for any prime 
characteristic. 
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Now that the possibility has been seen to be an actuality, it points out 
a sharp difference between the case of characteristic 0 and that of charac¬ 
teristic p. The presence of irreducible polynomials with multiple roots in 
the latter case leads to many interesting, but at the same time complicating, 
.fubtletics. These require a more elaborate and sophisticated treatment 
|: which we prefer to avoid at this stage of the game. Therefore , we make the 
fat assumption for the rest of this chapter that all fields occurring in the text material 
proper are fields of characteristic 0. 

DEFINITION The extension K of F is a simple extension of F if K = F{ a) 
for some a in K. 

In characteristic 0 (or in properly conditioned extensions in characteristic 


p # 0; see Problem 14) all finite extensions are realizable as simple ex¬ 
tensions. This result is 

THEOREM 5.5.1 If F is of characteristic 0 and if a, b, are algebraic over F, 
then there exists an element c e F(a, b) such that F(a, b) = F{c). 

Proof. Let f(x) and g(x), of degrees m and n, be the irreducible poly¬ 
nomials over F satisfied by a and b, respectively. Let K be an extension 
of F in which both/ (x) and g(x) split completely. Since the characteristic 
of F is 0, all the roots of/ (x) are distinct, as are all those of g(x). Let the 
roots of /(*) be a = a u a 2 ,..., a m and those of g(x), b = b l} b 2 ,..., b n . 

If j # 1 3 then bj ^ b t = b, hence the equation a t + Abj = a x + Ab 1 = 
« + Ab has only one solution A in K, namely, 


A = 


b - bj 

Since F is of characteristic 0 it has an infinite number of elements, so we 
can find an element y e F such that a t + ybj ^ a + yb for all i and for 
j ¥=■ 1. Let c = a + yb; our contention is that F(c) = F(a, b). Since 
feF(a, b), we certainly do have that F(c) c= F(a, b). We will now show 
That both a and b are in F(c) from which it will follow that F(a, b) cz F(c). 
; N#w b satisfies the polynomial g{x) over F, hence satisfies g(x) considered 
a polynomial over K = F(c). Moreover, if h(x) = / {c — yx) then 
*h(x) e K\x\ and h{b) — f (c — yb) = f (a) — 0, since a = c — yb. Thus in 
pSome extension of K, h(x) and g(x) have x — b as a common factor. We 
^assert that x — b is in fact their greatest common divisor. For, if b - ^ b 
k an °ther root ofg(*), then h(bj) — f {c — ybj) ^ 0, since by our choice 
of y, c ~ ybj forj ^ 1 avoids all roots a t of f(x). Also, since (x - b) 2 f g(x), 
(* - b) 2 cannot divide the greatest common divisor of h(x) and g(x). Thus 
* ~ b is the greatest common divisor of h(x) and g(x) over some extension 
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of K. But then they have a nontrivial greatest common divisor over K, 
which must be a divisor of x - b. Since the degree of x - b is 1, we see 
that the greatest common divisor of g(x) and h(x) in K\x\ is exactly x — b. 
Thus x - be K[x], whence b e K; remembering that K = F(c), we obtain 
that b e F(c). Since a = c — yb, and since b,ceF(c), y e F c F(c), we 
get that a e F(c), whence F(a,b) cz F(c). The two opposite containing 
relations combine to yield F(a, b) = F(c). 

A simple induction argument extends the result from 2 elements to any 
finite number, that is, if oq,. .., oc„ are algebraic over F, then there is an 
element c e F(a l5 ..., a„) such that F(c) = F(cl u . .., oc„). Thus the 

COROLLARY Any finite extension of a field of characteristic 0 is a simple extension. 

Problems 

1. If F is of characteristic 0 and f (x) e F[x] is such that/'(x) = 0, 
prove that / (x) = a e F. 

2. If F is of characteristic p 0 and if f {x) e F[x] is such that 
f'(x) = 0, prove that / {x) = g(x p ) for some polynomial g{x) e F[x]. 

3. Prove that (f(x) + g(x))' =f'{ x ) + g'( x ) an( i that (cxf(x))' = 
of '(x) for/ (x), g(x) e F[x] and a e F. 

4. Prove that there is no rational function in F(x) such that its square is x. 

5. Complete the induction needed to establish the corollary to Theorem 
5.5.1. 

An element a in an extension K of F is called separable over F if it satisfies 
a polynomial over F having no multiple roots. An extension K of F is 
called separable over F if all its elements are separable over F. A field F 
is called perfect if all finite extensions of F are separable. 

6 . Show that any field of characteristic 0 is perfect. 

7. (a) If F is of characteristic p ^ 0 show that for a, b e F, (a + b) p = 

a pm + b pm . 

(b) If F is of characteristic p ^ 0 and if K is an extension of F let 
T — {a e K \ a pn e F for some n}. Prove that T is a subfield of 
K. 

8 . If K, T, F are as in Problem 7(b) show that any automorphism of K 
leaving every element of F fixed also leaves every element of T fixed. 

*9. Show that a field F of characteristic p ^ 0 is perfect if and only if 
for every a 6 F we can find a b 6 F such that b p = a. 

10. Using the result of Problem 9, prove that any finite field is perfect- 
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If K is an extension of F prove that the set of elements in K which 
are separable over F forms a subfield of K. 

If F is of characteristic p # 0 and if K is a finite extension of F, 
prove that given ae K either a pn e F for some n or we can find an 
integer m such that a pm £ F and is separable over F. 

If K and F are as in Problem 12, and if no element which is in K 
but not in F is separable over F, prove that given a e K we can find 
an integer n, depending on a, such that a p " e F. 

If K is a finite, separable extension of F prove that K is a simple 
extension of F. 

If one of a or b is separable over F, prove that F(a, b ) is a simple 
extension of F. 


5.6 The Elements of Galois Theory 

Given a polynomial p(x) in F\x\, the polynomial ring in x over F, we shall 
associate with p(x) a group, called the Galois group of p(x). There is a very 
close relationship between the roots of a polynomial and its Galois group; 
in fact, the Galois group will turn out to be a certain permutation group 
of the roots of the polynomial. We shall make a study of these ideas in this, 
and in the next, section. 

The means of introducing this group will be through the splitting field 
of p(x) over F, the Galois group of p{x) being defined as a certain group of 
automorphisms of this splitting field. This accounts for our concern, in so 
many of the theorems to come, with the automorphisms of a field. A 
beautiful duality, expressed in the fundamental theorem of the Galois theory 
(Theorem 5.6.6), exists between the subgroups of the Galois group and the 
subfields of the splitting field. From this we shall eventually derive a 
", condition for the solvability by means of radicals of the roots of a polynomial 
m terms of the algebraic structure of its Galois group. From this will follow 
the classical result of Abel that the general polynomial of degree 5 is not 
Solvable by radicals. Along the way we shall also derive, as side results, 
theorems of great interest in their own right. One such will be the funda¬ 
mental theorem on symmetric functions. Our approach to the subject is 
ounded on the treatment given it by Artin. 

Recall that we are assuming that all our fields are of characteristic 0 , 
cnee we can (and shall) make free use of Theorem 5.5.1 and its corollary. 
By an automorphism of the field K we shall mean, as usual, a mapping a 
f K onto itself such that a {a + b) = a {a) + a{b) and a{ab) — o(a)o(b) 
or all a, b e K. Two automorphisms a and t of K are said to be distinct 
v(a) 7 ^ r(a) for some element a in K. 

We begin the material with 
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THEOREM 5.6.1 If K is a field and if <7 l5 . .., a n are distinct automorphisms 
of K, then it is impossible to find elements a x ,... ,a n , not all 0, in K such that 

a l°l( U ) + a 2°2( U ) + * " + a n°n( U ) = °/ 0r al1 U E K - 

Proof. Suppose we could find a set of elements a t ,. . ., a n in K, not all 
0, such that a^ofu) + ••• + a„o n (u ) = 0 for all ueK. Then we could 
find such a relation having as few nonzero terms as possible; on renumbering 
we can assume that this minimal relation is 

(«)+•'• + a m a m (u) = 0 ( 1 ) 

where a l ,. . . , a m are all different from 0. 

If m were equal to 1 then a x o x {u) = 0 for all u e K, leading to a l = 0, 
contrary to assumption. Thus we may assume that m > 1. Since the auto¬ 
morphisms are distinct there is an element ceK such that o x {c) # <r m (c). 
Since cu E K for all u e K, relation (1) must also hold for cu, that is, 
a x o x {cu) + a 2 o 2 (cu) + • • • + a m o m (cu) = 0 for all ue K. Using the hypo¬ 
thesis that the c’s are automorphisms of K, this relation becomes 

a x ofc)ofu) + a 2 o 2 {c)o 2 {u) +••■< + a m o m (c)o m (u) = 0. (2) 

Multiplying relation (1) by ofc) and subtracting the result from (2) 
yields 

« 2 (^ 2 ( c ) - ofc))o 2 {u) + • • • + a m (o m (c) - ofc))o m {u) = 0 . ( 3 ) 

If we put b t = a^Oiic) — ^(c)) for i = 2 ,..., m, then the b { are in K, 

b m = a m(o m ( C ) - ClM) # °> since a m # ° 5 and °m( C ) ~ # 0 Y et 

b 2 ° 2 ( u ) + ‘ ‘ + b m a m( u ) = d f° r ad u E K. This produces a shorter rela¬ 

tion, contrary to the choice made; thus the theorem is proved. 

DEFINITION If G is a group of automorphisms of K, then the fixed field 
of G is the set of all elements a e K such that o{a) = a for all a eG. 

Note that this definition makes perfectly good sense even if G is not a 
group but is merely a set of automorphisms of K. However, the fixed field 
of a set of automorphisms and that of the group of automorphisms generated 
by this set (in the group of all automorphisms of K) are equal (Problem 1), 
hence we lose nothing by defining the concept just for groups of auto¬ 
morphisms. Besides, we shall only be interested in the fixed fields of groups 
of automorphisms. 

Having called the set, in the definition above, the fixed field of G, it 
would be nice if this terminology were accurate. That it is we. see in 


LEMMA 5.6.1 The fixed field of G is a subfield of K. 
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Proof. Let a, b be in the fixed field of G. Thus for all a e G, a (a) = a 
l^tfid o(b) = b. But then o(a ± b) = a{a) + o(b) = a ± b and a{ab) = 
(a)o(b) = ab; hence a ± b and ab are again in the fixed field of G. If 
5^ 0, then a(b~ 1 ) = o(b )~ 1 = b~ 1 , hence b~ 1 also falls in the fixed 
Id of G. Thus we have verified that the fixed field of G is indeed a sub- 
id of K. 

We shall be concerned with the automorphisms of a field which behave 
a prescribed manner on a given subfield. 

; DEFINITION Let K be a field and let F be a subfield of K. Then the 
group of automorphisms of K relative to F, written G(K, F), is the set of all 
automorphisms of K leaving every element of F fixed; that is, the auto¬ 
morphism a of K is in G{K, F) if and only if o{ct) = a for every a e F. 


I 


It is not surprising, and is quite easy to prove 
LEMMA 5.6.2 G(K, F) is a subgroup of the group of all automorphisms of K. 


k We leave the proof of this lemma to the reader. One remark: K contains 
the field of rational numbers F 0 , since K is of characteristic 0, and it is easy 
■ | to see that the fixed field of any group of automorphisms of K, being a field, 
& must contain F 0 . Hence, every rational number is left fixed by every 
|I automorphism of K. 

We pause to examine a few examples of the concepts just introduced. 

Example 5.6.1 Let K be the field of complex numbers and let F be’The 
$ field of real numbers. We compute G(K, F). If a is any automorphism of 


X 


since 


i 2 = —1, o{i) 2 = o(i 2 ) = <t( — 1) = —1, hence o(i) = ±i. If, 


in addition, a leaves every real number fixed, then for any a + bi where 
«, b are real, a{a + bi) = o(a) + a{b)a{i) = a + bi. Each of these possi¬ 
bilities, namely the mapping a fa + bi) = a + bi and o 2 { a + bi) = a — bi 
defines an automorphism of K, o 1 being the identity automorphism and 
<r 2 complex-conjugation. Thus G(K, F) is a group of order 2. 

Wfaat is the fixed field of G(K, F) ? It certainly must contain F, but does 
it contain more? If a + bi is in the fixed field of G(K, F) then a + bi = 
ff 2 ( a + bi) = a — bi, whence b = 0 and a = a + bi e F. In this case 
We see that the fixed field of G(K, F) is precisely F itself. 

Example 5.6.2 Let F 0 be the field of rational numbers and let K = 
XofiJZ) where ^2 is the real cube root of 2. Every element in K is of the 
form a 0 -I- oqx/2 + a 2 (x/2) 2 , where Oq, a x , a 2 are rational numbers. If 
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a is an automorphism of K, then <r(V2) 3 — c((n/2) 3 ) — o{2) 2, hence 

a(l/2) must also be a cube root of 2 lying in K. However, there is only 
one real cube roo_t of 2, and since K is a subfield of the real field, we must 
have that o$fi) = 112. But then o{clq + a^2 + a 2 (^2) 2 ) = Oq + 
a ^2 + a 2 (n/ 2) 2 , that is, a is the identity automorphism of K. We thus 
see that G(K, F 0 ) consists only of the identity map, and in this case the 
fixed field of G{K, F 0 ) is not F 0 but is, in fact, larger, being all of K. 

Example 5.6.3 Let F 0 be the field of rational numbers and let co = 
e 2ni/5 ; thus co 5 = 1 and co satisfies the polynomial x 4 + * 3 + x 2 + x + 1 
over F 0 . By the Eisenstein criterion one can show that x 4 + x 3 + x + 

* + 1 is irreducible over F 0 (see Problem 3). Thus K = F 0 (©) is of degree 
4 over F 0 and every element in K is of the form a 0 + a x 0D + ct 2 co 2 + a 3 a) 3 
where all of a 0 , a 1? a 2 , and a 3 are in F 0 . Now, for any automorphism 
a of K, ( 7 (g)) # 1, since (7(1) = 1, and (7 (g)) 5 = (7(g) 5 ) = (7(1) = 1, 
whence a{co) is also a 5th root of unity. In consequence, (7 (g)) can only 
be one of co, co 2 , co 3 , or co 4 . We claim that each of these possibilities 
actually occurs, for let us define the four mappings <7 1? cr 2 , (7 3 , and (7 4 by 
fffoto + oc t co + a 2 co 2 + a 3 co 3 ) = a 0 + afico') + ct 2 (co 1 ) 2 + ^(gi 1 ) 3 , for 
i = 1, 2, 3, and 4. Each of these defines an automorphism of K (Problem 
4). Therefore, since aeG(K,F 0 ) is completely determined by <t(g)), 
G{K, F 0 ) is a group of order 4, with ^ as its unit element. In light of 
(7 2 = ( 7 4 , ( 7 2 3 = (7 3 , (7 2 4 = (7 1S G{K, F 0 ) is a cyclic group of order 4. 
One can easily prove that the fixed field of G(K, F 0 ) is F 0 itself (Problem 5). 
The subgroup A = {a x , (7 4 ) of G(K, F 0 ) has as its fixed field the set of all 
elements a 0 + a 2 (G) 2 + co 3 ), which is an extension of F 0 of degree 2. 

The examples, although illustrative, are still too special, for note that in 
each of them G{K, F) turned out to be a cyclic group. This is highly 
atypical for, in general, G(K, F) need not even be abelian (see Theorem 
5.6.3). However, despite their speciality, they do bring certain important 
things to light. For one thing they show that we must study the effect of 
the automorphisms on the roots of polynomials and, for another, they point 
out that F need not be equal to all of the fixed field of G(K, F ). The cases m 
which this does happen are highly desirable ones and are situations with 
which we shall soon spend much time and effort. 

We now compute an important bound on the size of G(K, F). 

THEOREM 5.6.2 If K is a finite extension of F, then G(K, F) is a finite group 
and its order, o(G(K, F)) satisfies o(G(K, F)) < [ K:F ]. 

Proof. Let [ K:F ] = n and suppose that u t ,... ,u n is a basis of K over 
F. Suppose we can find n 4- 1 distinct automorphisms a t , a 2 , . . ., (7„+i 
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jn G ( K , F). By the corollary to Theorem 4.3.3 the system of n homogeneous 
linear equations in the n + 1 unknowns x x ,. .., x„ + x : 



+ 

a 2 {u x )x 2 

+ * 

. . + 

a n + \( U \)x„ + x 

= 0 


+ 

a 2 {Ui)x 2 

+ • • 

* + 

a n + l{ U i) X n + \ 

= 0 

(*0*i 

+ 

V2( U n) X 2 

+ * 

* • + 

a n + l( U n) X n + l 

= 0 


has a nontrivial solution (not all 0) x x = a x , . .., x n + l = a n + l in K. Thus 
fl i*i(“i) + a 2°2 ( u i) + •'• + a n + i^n + M = 0 (1) 


for i — 1, 2, ..., w. 

Since every element in F is left fixed by each a i and since an arbitrary 
element t in K is of the form t = a x u x + • • • + a n u n with a 1? .. ., a„ 
in F, then from the system of equations (1) we get a x <r x (t) + ••• + 
4 b+ i< 7 b + i (t) = 0 for all t e K. But this contradicts the result of Theorem 


5.6.1. Thus Theorem 5.6.2 has been proved. 


Theorem 5.6.2 is of central importance in the Galois theory. However, 
aside from its key role there, it serves us well in proving a classic result 
concerned with symmetric rational functions. This result on symmetric 
functions in its turn will play an important part in the Galois theory. 

First a few remarks on the field of rational functions in n- variables over a 

- field F. Let us recall that in Section 3.11 we defined the ring of polynomials 
in the ra-variables x x> . .., x„ over F and from this defined the field of 

| rational functions in x x ,... ,x n , F(x x ,..., x n ), over F as the ring of all 
quotients of such polynomials. 

Let S n be the symmetric group of degree n considered to be acting on the 
§ set [1, 2,...,«]; for a e S n and i an integer with 1 < i < n, let a(i) be 
the image of i under a. We can make S n act on F(x x , .. ., x n ) in the 
y following natural way: for a e S n and r(x x ,. .., x n ) e F(x x ,..., x n ), define 

- the mapping which takes r(x x ,.. ., x n ) onto r(* ff(1) ,. .., x a( ^ n) ). We shall 
( write this mapping of F(x x ,. . ., x n ) onto itself also as a. It is obvious 

M that these mappings define automorphisms of F(x x ,. . ., x n ). What is 
®*he fixed field of F(x l ,...,x n ) with respect to S n ? It consists of all 
"rational functions r(x x ,..., x n ) such that r(x v ,. . ., x„) = r(* ff(1) ,. .., * CT(n) ) 
ibr all a e S n . But these are precisely those elements in F(x l ,...,x n ) 
''fhich are known as the symmetric rational functions. Being the fixed field 
°f S n they form a subfield of F(x x ,. . ., x n ), called the field of symmetric 
rational functions which we shall denote by S. We shall be concerned 
Pwth three questions: 

I. What is [F (;q,..., *„):£]? 

What is G(F(x x ,. . ., x n ), S) ? 

3. Can we describe S in terms of some particularly easy extension of F? 
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We shall answer these three questions simultaneously. 

We can explicitly produce in S some particularly simple functions con¬ 
structed from known as the elementary symmetry; functions in 

v x These are defined as follows: 

A 15 ... 3 


aj = x, + X, + • • • + *„ — Xj 
«2 = 2 *t*j 


i= 1 


i<j 


«3 = ^ W* 

i<j<k 


a n — * 1 * 2 ' '' x n . 

That these are symmetric functions is left as an exercise. For n = 2, 3 and 
4 we write them out explicitly below. 


n = 2 

«1 = *1 + * 2 - 
a 2 == * 1 * 2 - 

n — 3 

a i — x i + *2 + *3- 

fl 2 = + *1*3 + *2*3- 

n — 4 

«1 = ^1 + *2 + *3 "f *4- 

fl 2 = ;q* 2 + ^!*3 + *1*4 + *2*3 + *2*4 + *3*4- 
fl 3 = X l X 2 X 3 + *1*2*4 + *1*3*4 + *2*3*4- 


d t 1 — *1*2*3*4- 

Note that when n = 2, x x and * 2 are the roots of the polynomial t 1 ' 
a t t + « 25 that when n = 3, x u x 2 , and * 3 are roots of t - a x t + a 2 t «3 
and that when n = 4, x u x 2 , * 3 , and * 4 are all roots of t - a x t + a 2 t 

^ si^care all in S, the field F(a v obtained by ad¬ 
joining to F must lie in 5. Our objective is now twofold, 

namely, to prove 


1. [F(*!, ...,*„) :£] = n\. 

2. S = F (a t , . .., a n ). 

Since the group S„ is a group of automorphisms 
leaving S fixed, S n <= G(F(x l , ...,*„), S). Thus, by 


of F(x i,. 
Theorem 


., *») 
5.6.2, 
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e- [F(x l3 . . ., x n ) :£] > o{G{F{x n ,. . ., x n ), S)) > o(S n ) = w!. If we could 
show that [Ffa,. . .,x„):F(a 1} . . .,a n )] <n\, well then, since F(a l ,...,a n ) 
is a subfield of S, we would have nl > [F(x 1 , . . ., x n ):F(a 1 ,. . ., aj] = 
| \F(x u . . ., x n ) . . ., a„)] > nl. But then we would get that 

I = l and so S =F(a t ,. . ., a n ), 

4,'and, finally, S n — G(F(x l , . . . , x n ), S) (this latter from the second sen- 
» tence of this paragraph). These are precisely the conclusions we seek. 

* Thus we merely must prove that [F{x lt . . . ,x n ) :F(a t ,. . ., aj] < nl. 

To see how this settles the whole affair, note that the polynomial p{t) = 
L t* — t" 1 + a 2 t n 2 • • • + (— 1 ) n a n , which has coefficients in F(a t ,. . .,a n ), 
factors over F(x t ,. . ., x„) as p{t) = (t - x t ){t - x 2 ) • • * (t - xj. (This 
is in fact the origin of the elementary symmetric functions.) Thus p(t), 
of degree n over F{a 1 , . . . , a n ), splits as a product of linear factors over 
F{x It cannot split over a proper subfield of F(x t , . . ., x„) 

which contains F(a t , . . . , a n ) for this subfield would then have to contain 
both F and each of the roots of p(t), namely, x t , x 2 , . . . , x n ; but then this 
subfield would be all of F(x u . . ., x n ). Thus we see that F(x 1 ,..., x n ) is 
the splitting field of the polynomial p{t) = t n - a x t n ~ 1 + ••• + (-1 ) n a n 
over F(a t ,. . ., a n ). Since pit) is of degree n , by Theorem 5.3.2 we get 
|) {F^,.. ., x n ) :F(a l ,. .., a n )] < n\. Thus all our claims are established. 
We summarize the whole discussion in the basic and important result 

THEOREM 5.6.3 Let F be a field and let F [x t ,..., x n ) be the field of rational 
| functions in x lf . .., x n over F. Suppose that S is the field of symmetric rational 
% Junctions ; then 

| ** 1 7( Xl ,...,x n ):S] = nl 

2. G{F(x 1 ,..., x n ), S) = S„, the symmetric group of degree n. T ' 

3 - If a,,..., a n are the elementary symmetric functions in x t ,, x n , then 
S = F(a u a 2 ,...,a n ). 

4 - F( Xl ,...,x n ) is the splitting field over F{a x ,.. ., a n ) = S of the polynomial 
t n - afi”- 1 + a 2 t n ~ 2 --- + (~l)"a n . 

mentioned earlier that given any integer n it is possible to construct 
field and a polynomial of degree n over this field whose splitting field is of 
possible degree, nl, over this field. Theorem 5.6.3 explicitly 
f*roviaes us with such an example for if we put S = F[a t ,. .. , a n ), the 
P&tional function field in n variables a t ,. . ., a n and consider the splitting 
Peld of the polynomial t* - a l t n ~ 1 + a 2 t n ~ 2 ••• + (-l)X over S then 
« is of degree n ! over S. 

f ^ ar t 3 of Theorem 5.6.3 is a very classical theorem. It asserts that a sym- 
l ra tional function in n variables is a rational function in the elementary symmetric 

Factions of these variables. This result can even be sharpened to: A symmetric 
polynomial in n variables is a polynomial in their elementary symmetric 
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functions (see Problem 7). This result is known as the theorem on symmetric 
polynomials. 

In the examples we discussed of groups of automorphisms of fields and of 
fixed fields under such groups, we saw that it might very well happen that F 
is actually smaller than the whole fixed field of G(K, F ). Certainly F is 
always contained in this field but need not fill it out. Thus to impose the 
condition on an extension K of F that F be precisely the fixed field of 
G(K, F) is a genuine limitation on the type of extension of F that we are 
considering. It is in this kind of extension that we shall be most interested. 

DEFINITION K is a normal extension of F if K is a finite extension of F 
such that F is the fixed field of G(K, F). 

Another way of saying the same thing: If K is a normal extension of F } 
then every element in K which is outside F is moved by some element in 
G(K, F). In the examples discussed, Examples 5.6.1 and 5.6.3 were 
normal extensions whereas Example 5.6.2 was not. 

An immediate consequence of the assumption of normality is that it 
allows us to calculate with great accuracy the size of the fixed field of any 
subgroup of G ( K , F) and, in particular, to sharpen Theorem 5.6.2 from an 
inequality to an equality. 

THEOREM 5.6.4 Let K be a normal extension of F and let H be a subgroup 
of G ( K , F); let K H = {# e K \ a(x) = x for all a e H) be the fixed field of H. 
Then 

1. [*:**] = o(H). 

2. H = G(K, K h ). 

(In particular , when H = G(K, F), [iCF] = o(G(K, F)).) 

Proof. Since very element in H leaves K H elementwise fixed, certainly 
H c= G(K, K h ). By Theorem 5.6.2 we know that [ K:K H ) > o(G(K, K H ))‘, 
and since o(G(K, K H )) > o(H) we have the inequalities [ K:K H ] :> 
o(G(K, K h )) > o(H). If we could show that [KiKjj] = o(H ), it would 
immediately follow that o(H) — o(G(K, K H )) and as a subgroup of 
G(K, K h ) having order that of G(K, K H ), we would obtain that H = 
G(K, K h ). So we must merely show that [ K:K H ] = o(H) to prove every¬ 
thing. 

By Theorem 5.5.1 there exists an a e K such that K = K H (a); this a 
must therefore satisfy an irreducible polynomial over K H of degree m — 
[K:K h \ and no nontrivial polynomial of lower degree (Theorem 5.1.3). 
Let the elements of H be a 1} <r 2 ,.. ., o h , where a± is the identity of G(K, F) 
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and where h - o{H). Consider the elementary symmetric functions of 
d G 2 (a), .... <r h (a), namely, 

= ffi(fl) + ff 2 (fl) + • • • + a h {a) = *,(«) 


i = 1 


a 2 = a i( a ) a j( a ) 
i<j 

a /i = °'l( a )(^2( a ) ‘ ’ ’ &h( a )- 

Each a, is invariant under every <7 e H. (Prove!) Thus, by the definition 

/r ai ’ a f\v: ’ “* are a11 elementS of K «- However, « (as well as 
02 W, ■ y,a h (a)) ls a root of the polynomial p(x) = (x - ffl («))(* - ( fl )). .. 

(x — o h (a)) = x h — a,x h ~ 1 4 . r/ r h ~ 2 1 ... , / 1 \h , . 2y '' 

• k- R .1 1 2 T ( 1 ) a A having coefficients 

” 7 1 ' B ^. the natu ” of a > th,s forces h>m= [K-.K n ], whence o(H) > 
\ * H ' we already know that o(H) <; [K:K„] we obtain o(H) = 

[A :a h J, the desired conclusion. 

When H=G{K,F), by the normality of K over F, K„ = F; consequently 
tor this particular case we read off the result [K:F] = o(G{K, F)) 

We .. a , re , rapidly nearm S the central theorem of the Galois theory. What 
we st ,11 lack « the relationship between splitting fields and normal extensions, 
lms gap is filled by 

THEO r EM 5 6.5 K is a normal extension of F if and only if K is the splitting 
Jiela of some polynomial over F. s 

Pr°° f . In one direction the proof will be highly reminiscent of that of 
lneorem 5.6.4. r 

Suppose that K is a normal extension of F- by Theorem 5.5.1 K = F(a) 

Wer S ? r t.' he P ° lyn0mial ^ = (* ~ - °M) ■ ■ ' (x - a.(a)) 

e K, where<r„ <r 2 ,. . ., <x, are all the elements of G(K, F). Expanding 

i iX) WC S “ tHat M ~ «.*■-*+ + ••• + (-DX wherf 

i> • •., a„ are the elementary symmetric functions in a = a 1 (a), a (a) 

r-Vir^ th T ai ’ ' ’ ’ ’ a " arC each invariant with respect to every 
Th e " enCe by the normalit y K over F, must all be in F 
far, “c- K splits the polynomial p{x) e F[x] into a product of linear 

no o)l 12 f/ rT °[ P(X) and $inCe a generates K over F ’ a can be in 
/■(xfovcr V' lbfiC d ° f K WhlCh COntaU1S F - Thus K is the splitting field of 

JZ the f °‘ her dir r ecti ° n; k is a littie m °™ complicated. We separate 
on one piece of its proof in ^ 

LEMMA 5.6.3 Let K be the splitting field of f{x) in F[x] and let p{x) be an 
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irreducible factor of f (x) in F[x\. If the roots of p{x) are oc l5 . . ., a r , then for 
each i there exists an automorphism a t in G(K, F) such that a t (a = a f . 

Proof. Since every root of p{x) is a root of f ( x ), it must lie in K. Let 
oe l5 (%. be any two roots of p(x). By Theorem 5.3.3, there is an isomorphism 
x of 1 F l = F(af onto F[ = F(a,) taking onto a t and leaving every 
element of F fixed. Now K is the splitting field of f {x) considered as a 
polynomial over F l ; likewise, K is the splitting field of f (x) considered as a 
polynomial over F[. By Theorem 5.3.4 there is an isomorphism a t of K 
onto K (thus an automorphism of K) coinciding with T on But then 
afaf = xiaf = a f and Oi leaves every element of F fixed. This is, of 
course, exactly what Lemma 5.6.3 claims. 

We return to the completion of the proof of Theorem 5.6.5. Assume that 
K is the splitting field of the polynomial f (x) in F[x]. We want to show 
that K is normal over F. We proceed by induction on \K'.F], assuming 
that for any pair of fields K v F 1 of degree less than [tf:F] that whenever 
K l is the splitting field over F l of a polynomial in F 1 [x] , then K 1 is normal 
over F 1 . 

If f( x ) eF[x] splits into linear factors over F, then K = F, which is 
certainly a normal extension of F. So, assume that f ( x ) has an irreducible 
factor p{x) £ jF[x] of degree r > 1. The r distinct roots oc^, s oc r of 

p(x) all lie in K and K is the splitting field of / (x) considered as a poly¬ 
nomial over F(a 1 ). Since 


[JC:JF(«i)] 


[F(ai):F] 


n 

= - < n, 


r 


by our induction hypothesis K is a normal extension of F(af. 

Let 9 e K be left fixed by every automorphism oeG(K, F); we would 
like to show that 9 is in F. Now, any automorphism in G(K, F(a 1 )) certainly 
leaves F fixed, hence leaves 9 fixed; by the normality of K over F(af, 
this implies that 9 is in F(af. Thus 

9 = k 0 + +''' + where A 0 ,. ..,I r -i^F. (1) 

By Lemma 5.6.3 there is an automorphism <r f of K, eG(K, F), such 
that = a t ; since this ^ leaves 9 and each fixed, applying it to 

(1) we obtain 

9 = X 0 + AiOC, + A 2 a f 2 + ••• + for i = 1, 2,. . ., r. (2) 

Thus the polynomial 

q(x) = + k r - 2 x r ~ 2 + ••• + Kx + (A 0 — 9) 

in X[x], of degree at most r — 1, has the r distinct roots a l5 a 2 ,.. ., <V 
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This can only happen if all its coefficients are 0; in particular, A 0 — 0 = 0 
whence 0 = A 0 so is in F. This completes the induction and proves that K 
is a normal extension of F . Theorem 5.6.5 is now completely proved. 

DEFINITION Let f (x) be a polynomial in F[x\ and let K be its splitting 
field over F. The Galois group of /(*) is the group G(K , F) of all the auto¬ 
morphisms of K, leaving every element of F fixed. 

Note that the Galois group of/(x) can be considered as a group of 
permutations of its roots, for if a is a root of / (x) and if a eG(K, F), 
then a{a) is also a root of f(x). 

We now come to the result known as the fundamental theorem of Galois 
theory. It sets up a one-to-one correspondence between the subfields of the 
splitting field of f [x) and the subgroups of its Galois group. Moreover, it 
gives a criterion that a subfield of a normal extension itself be a normal 
extension of F. This fundamental theorem will be used in the next section 
to derive conditions for the solvability by radicals of the roots of a poly¬ 
nomial. 

THEOREM 5.6.6 Let f (*) be a polynomial in F[x\, K its splitting field over 
F, and G(K, F) its Galois group. For any subfield T of K which contains F let 
G{K, T ) = {a eG{K,F) \ a{t) = t for every te T) and for any subgroup 
H of G(K, F) let K h = {xeK | <y(x) = x for every a e H). Then the asso¬ 
ciation of T with G [K, T) sets up a one-to-one correspondence of the set of subfields 
of K which contain F onto the set of subgroups of G (K, F) such that 

1. T = K G ( K T y r 

2 H= G (K, K h ) . 

3. [K:TI = o(G(K, T)), [ T.F ] = index of G(K, T) in G{K, F ). 

4. T is a normal extension of F if and only if G{K, T ) is a normal subgroup of 
G(K, F). 

5. When T is a normal extension of F, then G{T, F) is isomorphic to 
G{K, F)IG(K, T). 

Pr P ot • Since K is th e splitting field of f (x) over F it is also the splitting 
field of/ (x) over any subfield T which contains F, therefore, by Theorem 
5.6.5, K is a normal extension of T. Thus, by the definition of normality, 
T is the fixed field of G(K, T ), that is, T = K G(K T) , proving part 1. 

Since K is a normal extension of F, by Theorem 5.6.4, given a subgroup H 
of G(K, F), then H = G(K, K H ), which is the assertion of part 2. More¬ 
over, this shows that any subgroup of G(K, F) arises in the form G(K, T), 
whence the association of T with G(K, T) maps the set of all subfields of K 
containing F onto the set of all subgroups of G(K, F). That it is one-to-one 
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is clear, for, if G(A, T x ) = G{K, T 2 ) then, by part 1, T x = A' G(K>Tl) - 

K GiK ^ 2 ) ”” —— 2 * r 

Since A is normal over A, again using Theorem 5.6.4, [A: A] = 

o(G (A, A)); but then we have o(G(K,F)) = [A:A] = [A: A] [A: A] = 

o(G (A, A)) [A: A], whence 

rr-Al = = index of G (A, A) 

L • J o(G (A, A)) 

in G(A, A). This is part 3. 

The only parts which remain to be proved are those which pertain to 
normality. We first make the following observation. A is a normal extension 
of F if and only if for every <7 e G(A, A), o{T) cz A. Why? We know 
by Theorem 5.5.1 that A = A (a); thus if a(A) cz A, then a (a) e A for 
all o eG(K, F). But, as we saw in the proof of Theorem 5.6.5, this implies 
that A is the splitting field of 

p{x) = n c* - 

aeG(K,F) 

which has coefficients in F. As a splitting field, A, by Theorem 5.6.5, is 
a normal extension of F. Conversely, if A is a normal extension of A, then 
T = A (a), where the minimal polynomial of a, p(x), over A has all its roots 
in A (Theorem 5.6.5). However, for any a eG{K, A), a {a) is also a root 
of p{x), whence a {a) must be in A. Since A is generated by a over A, we 
get that a( A) c A for every a e G(K, F). 

Thus A is a normal extension of A if and only if for any a e G(K, A), 
t e G (A, A) and t e A, a(f) e A and so i(a(0) = <r(0 5 that is > if and 
only if (7 -1 T(7(0 = t. But this says that A is normal over A if and only 
if <j~ x G{K, T)a c G(A, A) for every 0 e G(K, F). This last condition 
being precisely that which defines G(A, A) as a normal subgroup of 
G{K, A), we see that part 4 is proved. 

Finally, if A is normal over A, given aeG(K,F), since o{T) cz A, 

(7 induces an automorphism a * of A defined by <7*(0 = <t(0 f° r ever Y 

£ e A. Because a* leaves every element of A fixed, (7* must be in G( A, A). 
Also, as is evident, for any a, p e G (A, A), ((7t/0* = whence the 

mapping of G(A, A) into G( A, A) defined by (7 -> (7* is a homomorphism 
of G(A, A) into G(A, A). What is the kernel of this homomorphism? 
It consists of all elements a in G (A, A) such that (7* is the identity map on 

A. That is, the kernel is the set of all a e G(A, A) such that t = <r*(0 — 

by the very definition, we get that the kernel is exactly G(A, A). 
The image of G(A, A) in G( A, A), by Theorem 2.7.1, is isomorphic to 
G(A, A)/G(A, A), whose order is o(G(A, A))/o(G(A, A)) = [A:A] (by 
part 3) = o(G( A, A)) (by Theorem 5.6.4). Thus the image of G(A, A) 
in G(A, A) is all of G (A, A) and so we have G (A, A) isomorphic to 
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G{K , F)/G(K, T). This finishes the proof of part 5 and thereby completes 
the proof of Theorem 5.6.6. 

Problems 

^ ■ If K is a field and S a set of automorphisms of K, prove that the fixed 
field of Sand that of S (the subgroup of the group of all automorphisms 
of K generated by S ) are identical. 

2. Prove Lemma 5.6.2. 

3. Using the Eisenstein criterion, prove that x A + x 3 + x 2 + x + 1 
is irreducible over the field of rational numbers. 

4. In Example 5.6.3, prove that each mapping cr, defined is an auto¬ 
morphism of F 0 (co). 

5. In Example 5.6.3, prove that the fixed field of F 0 (co) under cr l3 
° 2 , ° 3 j 04 is precisely F 0 . 

6 . Prove directly that any automorphism of K must leave every rational 
number fixed. 

7. Prove that a symmetric polynomial in x l3 . . ., x n is a polynomial in 
the elementary symmetric functions in x u . .., x„. 

8 . Express the following as polynomials in the elementary symmetric 
functions in x l3 x 2 , x 3 : 

(a) x 2 + x 2 2 + x 3 2 . 

(b) xf + x 2 3 + * 3 3 . 

(c) (*! - X 2 ) 2 ( Xl - x 3 ) 2 (x 2 - X 3 ) 2 . 

9. If a l3 a 2 , a 3 are the roots of the cubic polynomial x 3 + lx 2 — 

Sx + 3, find the cuoic polynomial whose roots are T 

(a) ofi 2 , <x 2 2 , a 3 2 . (b) —,—,—. ( c ) o^ 3 , a 2 3 , a 3 3 . 

«1 a 2 a 3 

* 10 . Prove Newton’s identities , namely, if oc l3 a 2 , . .., a„ are the roots of 
f (x) = x" + a x x n 1 + <z 2 x" 2 + • ’ • + a and if s k = tx. k + 
a 2 k +■■■ + a n k then 

(a) s k + a l s k _ 1 + a 2 s k _ 2 + • • • + a k _ 1 s 1 + ka k = 0 if k = 1 , 2 ,.. .,«. 

/ (k) s k + a l s k-l + ‘ ■ * + a n s k-n = 0 for k > n. 

(c) For n = 5, apply part (a) to determine s 2 , j- 3 , j- 4 , and s 5 . 

11. Prove that the elementary symmetric functions in x 1; . . ., x„ are 

indeed symmetric functions in x,__ x 

12. If p{x) = x n - 1 prove that the Galois group of p{x) over the field 
of rational numbers is abelian. 

The complex number co is a primitive nth root of unity if co" = 1 but co m ^ 1 
° r 0 < m < n. F 0 will denote the field of rational numbers. 
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13. (a) Prove that there are (j){n) primitive rath roots of unity where 

cj)(n) is the Euler ^-function. 

(b) If co is a primitive nth root of unity prove that t 0 (co) is e 
splitting field of x" - 1 over F 0 (and so is a normal extension 

f c ) if ah ■, a>M n ) are the <M W ) P rimitive wth roots of umty ’ P rove 
' ’ tha“ any Automorphism of F oM takes into some 

(d) Prove that [/^(a^) :F 0 ] — $(”)• 

14. The notation is as in Problem 13. 

*( a ) Prove that there is an automorphism of F 0 (a) 1 ) which takes 

into C0i . \ r \ 

(b) Prove the polynomial p n (x) = (x - GfiX* _ a> 2 )" ' “*(»)! 

has rational coefficients. (The polynomial p„(x) is called the 

nth cyclotomic polynomial.) 

*(c) Prove that, in fact, the coefficients of p n (x) are integers. 

**15. Use the results of Problems 13 and 14 to prove that/>„(*) is irreducible 
over F 0 for all n > 1. (See Problem 8, Section 3.) 

16. For n = 3, 4, 6, and 8, calculate p n (x) explicitly, show that it has 
integer coefficients and prove directly that it is irreducible over 0 - 

17. (a) Prove that the Galois group of x 3 - 2 over F 0 is isomorphic to 

S 3 , the symmetric group of degree 3. 

(b) Find the splitting field, K, of x 3 - 2 over F 0 . 

(c) For every subgroup HotS 3 find K„ and check the correspondence 
given in Theorem 5.6.6. 

(d) Find a normal extension in K of degree 2 over F 0 . 

18. If the field F contains a primitive nth root of unity, prove that the 
Galois group of x n — a, for a e F, is abelian. 


5.7 Solvability by Radicals 

Given the specific polynomial * 2 + 3* + 4 over the field of rational 
numbers F 0 , from the_quadratic formula for its roots we know tha 
roots are (-3 + V-7)/2; thus the field F„(V7i) is the splitting field oi 
x 2 + 3x + 4 over F 0 . Consequently there is an element y = -7 in ^ 
such that the extension field F 0 (co) where co 2 = 7 is such that it contain 

all the roots of x 2 + 3x + 4. , , . lv- 

From a slightly different point of view, given the general quadratic po y 
nomial />(*) = ** + «,* + « a over F, we can consider it as a fiarUcuHr 
polynomial over the field F(a„ a 2 ) of rational functions in the two vanab 
a. and a 2 over F; in the extension obtained by adjoining 0 ) to F K, 2 
where to 2 = a, 2 - 4 a 2 e F(a„ a 2 ), we find all the roots of ?(*)■ T her 
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a formula which expresses the roots of p{x) in terms of a 1} a 2 and square 
roots of rational functions of these. 

For a cubic equation the situation is very similar; given the general cubic 
equation p(x) = x + a x x 2 + a 2 x + a 3 an explicit formula can be given, 
involving combinations of square roots and cube roots of rational functions 
in a u a 2 , a 3 . While somewhat messy, they are explicitly given by Cardan’s 
formulas: Let/) = a 2 - (a, 2 /3) and 


and let 


p -V 


a3 2 

L . 

27 4 


a= ? -i~ t + t 

V 2 V 27 4 

(with cube roots chosen properly); then the roots are P + Q - ( a /3), 
coP + co Q, — (flj/3), and co 2 P + co(7 — (a 1 /3), where co # 1 is a cube 
root of 1. The above formulas only serve to illustrate for us that by 
adjoining a certain square root and then a cube root to F( fll , a 2 , a 3 ) we 
reach a field in which p(x) has its roots. 

For fourth-degree polynomials, which we shall not give explicitly, by 
using rational operations and square roots, we can reduce the problem to 
that of solving a certain cubic, so here too a formula can be given expressing 
the roots in terms of combinations of radicals (surds) of rational functions 
of the coefficients. 

For polynomials of degree five and higher, no such universal radical 
formula can be given, for we shall prove that it is impossible to express 
their roots, in general, in this way. 

Given a field F and a polynomial p{x) e F[x\, we say that p{x) is solvable 
by radicals over F if we can find a finite sentence of fields F t = F (© ), 
F 2 r = F 1 (o 2 ), • • •, F k = F k _ 1 (co k ) such that cof 1 e F, cof 2 e F u . , 

( °k k ^ F k -1 such that the roots of p{x) all lie in F k . 

If K ls the spotting field of p(x) over F, then p(x) is solvable by radicals 
over F if we can find a sequence of fields as above such that K a F k . An 
unportant remark, and one we shall use later, in the proof of Theorem 
•7.2, is that if such an F k can be found, we can, without loss of generality, 
assume it to be a normal extension of F; we leave its proof as a problem 
(Problem 1). 

B y the general polynomial of degree n over F, p{x) = x n + a^x n ~ 1 -|_ + a 

We mean the following: Let F(a u . . ., a n ) be the field of rational functions"’ 
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in the n variables a u . . ., a n over F, and consider the particular 
polynomial p{x) = x n + a x r?~ 1 +••■ + «„ over the field F(a 1 ,. . ., a n ). 
We say that it is solvable by radicals if it is solvable by radicals over 
F(a 1} . . ., a„). This really expresses the intuitive idea of “finding a for¬ 
mula” for the roots of p(x) involving combinations of mth roots, for various 
ml s, of rational functions in a u a 2 , . . . , a n . For n = 2, 3, and 4, we pointed 
out that this can always be done. For n > 5, Abel proved that this cannot 
be done. However, this does not exclude the possibility that a given poly¬ 
nomial over F may be solvable by radicals. In fact, we shall give a criterion 
for this in terms of the Galois group of the polynomial. But first we must 
develop a few purely group-theoretical results. Some of these occurred as 
problems at the end of Chapter 2, but we nevertheless do them now officially. 

DEFINITION A group G is said to be solvable if we can find a finite chain 
of subgroups G = N 0 =) N t => N 2 =>•••=> N k = («), where each N t is a 
normal subgroup of and such that every factor group iV' i _ 1 /iV' i is 

abelian. 

Every abelian group is solvable, for merely take N 0 = G and N 1 = (e) 
to satisfy the above definition. The symmetric group of degree 3, S 3 , is 
solvable for take N t = {e, (1, 2, 3), (1, 3, 2)}; N t is a normal subgroup of 
S 3 and S 3 /N 1 and N 1 l(e) are both abelian being of orders 2 and 3, respec¬ 
tively. It can be shown that £4 is solvable (Problem 3). For n > 5 we 
show in Theorem 5.7.1 below that S n is not solvable. 

We seek an alternative description for solvability. Given the group G and 
elements a, b in G, then the commutator of a and b is the element a~ 1 b~ l ab. 
The commutator subgroup , G', of G is the subgroup of G generated by all the 
commutators in G. (It is not necessarily true that the set of commutators 
itself forms a subgroup of G .) It was an exercise before that G' is a normal 
subgroup of G. Moreover, the group G/G' is abelian, for, given any two 
elements in it, aG', bG', with a, b e G, then 

(aG'){bG') = abG' = ba(a~ l ab)G' 

= (since a~ x b~ 1 ab e G') baG' = ( bG'){aG'). 

On the other hand, if M is a normal subgroup of G such that G/M is abelian, 
then M =) G', for, given a, b e G, then ( aM){bM) = ( bM)(aM ), from 
which we deduce abM = baM whence a~ 1 b~ 1 abM = M and so 
a~ 1 b~ 1 ab e M. Since M contains all commutators, it contains the group 
these generate, namely G'. 

G' is a group in its own right, so we can speak of its commutator subgroup 
G (2) = (G')\ This is the subgroup of G generated by all elements 
( a’)~ 1 (b')~ 1 a'b' where a', b' e G'. It is easy to prove that not only is G (2) 
a normal subgroup of G' but it is also a normal subgroup of G (Problem 4). 
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a u d ^f nC ‘ he hiShCT commutator subgroups G<"» by 

Z,' ml . > ■ , Each G 1S a normal subgroup of G (Problem 4) and 

G v /G l Ms an abelian group. 

In terms of these higher commutator subgroups of G, we have a very 
succinct criterion for solvability, namely, 1 

LEMMA 5.7.1 G is solvable if and only if G (fc) = (e) for some integer k. 

Proof. If G (k) = ( e ) let N 0 = G, N 1 =G\ N 2 = G < 2 \. = 

G (fc) = («). We have ’ ’ * 

G = iV 0 =3 N x =3 #2 =3 • • • =5 N k = 0); 

each iV,- being normal in G is certainly normal in N i _ l . Finally, 

N i _ 1 G (i_1) G (i_1) 


W 


G (,) (G (i_ 1) )' 


hence is abelian. Thus by the definition of solvability G is a solvable group. 

Conversely, if G is a solvable group, there is a chain G = N 0 N t =5 
iV 2 =)•••=> = (g) where each JV, is normal in 7V f _ 1 and where N i _ ] /N i 

is abelian. But then the commutator subgroup N{ _ 1 of N i _ 1 must be 
contained in N t . Thus N t => = G', => JV; => (G'V = G (2) 

N * 3 ^ => (^ (2) )' = G (3 >, .*.,#,=> G (,) , («) = JV fc 3 G (fc) . We therefore 
obtain that G (fc) = (<?). 

COROLLARY If G is a solvable group and if G is a homomorphic image of G, 
then G is solvable. 


Proof. Since G is a homomorphic image of G it is immediate that (G) m 
is the image of G<*>. Since G<*> = (c) for some i, (G)<‘> = („) for the same 
k, whence by the lemma G is solvable. 

The next lemma is the key step in proving that the infinite family of 

groups S n , with n > 5, is not solvable; here S n is the symmetric group of 
degree n. 

LEiyiMA 5.7.2 Let G = S n , where n > 5; then G (k) for k = 1,2,... 
contains every 3-cycle of S n . 

Proof. We first remark that for an arbitrary group G, if N is a normal 
subgroup of G, then N' must also be a normal subgroup of G (Problem 5). 

We claim that if N is a normal subgroup of G = S n , where n > 5, which 
contains every 3-cycle in S n , then N' must also contain every 3-cycle. For 
suppose a — (1,2, 3), b = (1, 4, 5) are in N (we are using here that 
*>5); then 'ab = (3, 2, 1)(5, 4, 1)(1, 2, 3)(1, 4, 5) = (1, 4, 2), as 

a commutator of elements of N must be in N'. Since N' is a normal 
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subgroup of G, for any % e S n , n 4, 2 )n must also be in Choose a 
n in S n such that ;r(l) = q, ft(4) = i 2 , and n(2) = i 3 , where i t , i 2 , i 2 are 
any three distinct integers in the range from 1 to ra; then 7i _1 (l, 4, 2 )ti = 
(tj, * 2 , * 3 ) is in JV\ Thus N' contains all 3-cycles. 

Letting N = G, which is certainly normal in G and contains all 3-cycles, 
we get that G' contains all 3-cycles; since G' is normal in G, G (2) contains 
all 3-cycles; since G (2) is normal in G, G (3) contains all 3-cycles. Con¬ 
tinuing this way we obtain that G^ contains all 3-cycles for arbitrary k. 

A direct consequence of this lemma is the interesting group-theoretic 
result. 

THEOREM 5.7.1 S n is not solvable for n > 5. 

Proof. If G = S n , by Lemma 5.7.2, G (k) contains all 3-cycles in S n for 
every k. Therefore, G ^ ^ if) for any k , whence by Lemma 5.7.1, G cannot 
be solvable. 

We now interrelate the solvability by radicals of p(x ) with the solvability, 
as a group, of the Galois group of p(x). The very terminology is highly 
suggestive that such a relation exists. But first we need a result about the 
Galois group of a certain type of polynomial. 

LEMMA 5.7.3 Suppose that the field F has all nth roots of unity (for some 
particular n) and suppose that a ^ 0 is in F. Let x" — a e and let K be 
its splitting field over F. Then 

1. K = F(u) where u is any root of x" — a. 

2. The Galois group of x n — a over F is abelian. 

Proof. Since F contains all rath roots of unity, it contains £ = e Zni,n ; 
note that = 1 but # 1 for 0 < m < n. 

If u e K is any root of xP — a, then u, < *u, ffu, . . ., l u are all the 
roots of x" - a. That they are roots is clear; that they are distinct follows 
from: ffiu = ffiu with 0 < i < j < n, then since u # 0, and (§‘ — £ 3 )u = 0, 
we must have = £ j , which is impossible since £ J ~ l = 1, with 0 <j — 1 
< n. Since § e F, all of u, £u,. . ., x u are in F(u), thus F(u) splits 

x" — a; since no proper subfield of F(u) which contains F also contains u, 
no proper subfield of F(u) can split x" — a. Thus F(u) is the splitting 
field of x“ — a, and we have proved that K = F(u). 

If a, x are any two elements in the Galois group of x" — a, that is, if 
er, x are automorphisms of K = F(u) leaving every element of F fixed, then 
since both a(u) and x(u) are roots of x n — a, a(u) = < \ l u and xiu) — 
for some i and j. Thus ax(u) = a(£ J 'u) = £ J a(u) (since e F) = — 

£ i+J u; similarly, xcr(ra) = £ i+i u. Therefore, ax and xa agree on u and on 



Sec. 5.7 Solvability by Radicals 


255 


I. F hence on all of K = F(u). But then ax = xa, whence the Galois group 
&■ is abelian. r 

ii 

( Note that the lemma says that when F has all rath roots of unity, then 
adjoining one root of x n - a to F, where a e F, gives us the whole splitting 
field of *" - a; thus this must be a normal extension of F. 

I We assume for the rest of the section that F is afield which contains all nth roots 
| of unity for every integer n. We have 

f 

| THEOREM 5.7.2 If p(x) e F[x] is solvable by radicals over F, then the Galois 
■ group over F of p(x) is a solvable group. 

Proof. Let K be the splitting field of p(x) over F ; the Galois group of 

j P( x ) over F is G ( K , F )- Since p(x) is solvable by radicals, there exists a 
| sequence of fields 

I F<= F t = F(co,) c F 2 = F,(a> 2 ) <=•••<= F„ = F^fa), 

I W ^ ere 03 1 ' e F , e F u • • • > 0) k k e F k _ 1 and where K a F k . As we 
| pointed out, without loss of generality we may assume that F k is a normal 
I extension of F. As a normal extension of F, F k is also a normal extension 
f of any intermediate field, hence F k is a normal extension of each F t . 

I Lemma 5.7.3 each F t is a normal extension of F i _ 1 and since F k is 

I normal over F i _ 1 , by Theorem 5.6.6, G(F k ,F i ) is a normal subgroup in 
G(F k ,Fi_ i)- Consider the chain 

G{F k , F) 3 G(F k , Ff 3 G{F k , F 2 ) 3 • ■ • 3 G(F k , F k _ 1 ) 3 (e). (1) 

| . we remar ked, each subgroup in this chain is a normal subgroup 
| in the one preceding it. Since F t is a normal extension of F i _ l , by the 
I fundamental theorem of Galois theory (Theorem 5.6.6) the group of F ( 

| over F t _ u G(F i ,F i _ 1 ) is isomorphic to G(F k , F i _ 1 )/G(F k , Ff However' 

I y Lemma 5.7.3, G (F i} F t _ j) is an abelian group. Thus each quotient 
f group G(F k , F i _ 1 )/G(F k , F t ) of the chain (1) is abelian, 
f Thus the S rou P G{F k ,F) is solvable! Since K c F k and is a normal 
| extension of F (being a splitting field), by Theorem 5.6.6, G(F k , K) 

| > normal subgroup of G(F k ,F) and G(K,F ) is isomorphic to 
I F)/G(F k , K ). Thus G{K , F) is a homomorphic image of G(F k , F), a 

I solvable group; by the corollary to Lemma 5.7.1, G(K, F) itself must then 
f ^ a so ^ va bl e group. Since G(K, F) is the Galois group of p(x) over F the 
\ theorem has been proved. 

I '• 

f: We make two remarks without proof. 

[ I- The converse of Theorem 5.7.2 is also true; that is, if the Galois group 
of>(*) over F is solvable then p{x) is solvable by radicals over F. 
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2. Theorem 5.7.2 and its converse are true even if F does not contain 
roots of unity. 

Recalling what is meant by the general polynomial of degree n over F, 
p(x) = xP + a x xP~ 1 + ••• + «„, and what is meant by solvable by radicals, 
we close with the great, classic theorem of Abel: 

THEOREM 5.7.3 The general polynomial of degree n > 5 is not solvable by 
radicals. 

Proof. In Theorem 5.6.3 we saw that if F(a x ,.. ., a n ) is the field of 
rational functions in the n variables a x , . . ., a n , then the Galois group of 
the polynomial p(t) = t n + a x t n 1 +•••+«„ over F(a x ,..., a n ) was S n , 
the symmetric group of degree n. By Theorem 5.7.1, S n is not a solvable 
group when n > 5, thus by Theorem 5.7.2, p{t) is not solvable by radicals 
over F(a x , ...,«„) when n > 5. 

Problems 

*1. If p{x) is solvable by radicals over F, prove that we can find a sequence 
of fields 

F c F, = F(a> x ) a F 2 = F x (co 2 ) a • • • c F k = F k _ x ((o k ), 

where co/ 1 e F, cof 2 e F u ... , (O k fk e F k _ l5 F k containing all the 
roots of p(x), such that F k is normal over F. 

2. Prove that a subgroup of a solvable group is solvable. 

3. Prove that S A is a solvable group. 

4. If G is a group, prove that all G w are normal subgroups of G. 

5. If N is a normal subgroup of G prove that N' must also be a normal 
subgroup of G. 

6. Prove that the alternating group (the group of even permutations in 
S„) A n has no nontrivial normal subgroups for n > 5. 

5.8 Galois Groups over the Rationals 

In Theorem 5.3.2 we saw that, given a field F and a polynomial p(x), of 
degree n, in F|¥], then the splitting field of p(x) over F has degree at most 
n\ over F. In the preceding section we saw that this upper limit of n\ is, 
indeed, taken on for some choice of F and some polynomial p{x) of degree 
n over F. In fact, if F 0 is any field and if F is the field of rational functions 
in the variables a t , . . ., a n over F 0 , it was shown that the splitting field, K, 
of the polynomial p(x) — xP + a 1 x n ~ 1 +••• + «„ over F has degree 
exactly n ! over F. Moreover, it was shown that the Galois group of K over 
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F is S n , the symmetric group of degree n. This turned out to be the basis 
for the fact that the general polynomial of degree n, with n > 5, is not 
solvable by radicals. 

However, it would be nice to know that the phenomenon described 
above can take place with fields which are more familiar to us than the 
field of rational functions in n variables. What we shall do will show that 
for any prime number p, at least, we can find polynomials of degree p over 
the field of rational numbers whose splitting fields have degree p\ over the 
rationals. This way we will have polynomials with rational coefficients 
whose Galois group over the rationals is S p . In light of Theorem 5.7.2, we 
will conclude from this that the roots of these polynomials cannot be ex¬ 
pressed in combinations of radicals involving rational numbers. Although 
in proving Theorem 5.7.2 we used that roots of unity were in the field, and 
roots of unity do not lie in the rationals, we make use of remark 2 following 
the proof of Theorem 5.7.2 here, namely that Theorem 5.7.2 remains valid 
even in the absence of roots of unity. 

We shall make use of the fact that polynomials with rational coefficients 
have all their roots in the complex field. 

We now prove 

THEOREM 5.8.1 Let q{x) be an irreducible polynomial of degree p, p a prime, 
over the field Q of rational numbers. Suppose that q(x) has exactly two nonreal roots 
in the field of complex numbers. Then the Galois group of q(x) over Q is S , the 
symmetric group of degree p. Thus the splitting field of q(x) over Q has degree p\ 
over Q. 

Proof. Let K be the splitting field of the polynomial q(x) over Q. If 
a is a root of q[x) in K, then, since q(x) is irreducible over Q, by Theorem 
5.L3, [Q(a):Q] = p. Since K Q(a) zd Q and, according to Theorem 
5.1.1, [A:Q] = [A:Q(a)][Q(a) :Q] = [A:Q(a)]/>, we have that p\[K:Q\. 
If G is the Galois group of K over Q, by Theorem 5.6.4, o(G) = [A:F]. 
Thus p | o(G). Hence, by Cauchy’s theorem (Theorem 2.11.3), G has 
an element a of order p. 

To this point we have not used our hypothesis that q(x) has exactly two 
nonreal roots. We use it now. If a 1? a 2 are these nonreal roots, then 
a i = « 2 > «2 = «i (see Problem 13, Section 5.3), where the bar denotes 
the complex conjugate. If a 3 ,. . . , a p are the other roots, then, since they 
are real, a f = a. i for i > 3. Thus the complex conjugate mapping takes 
K into itself, is an automorphism r of K over Q, and interchanges and 
a 2 , leaving the other roots of q(x) fixed. 

Now, the elements of G take roots of q(x) into roots of q{x), so induce 
permutations of a 1? . . . , cc p . In this way we imbed G in S p . The auto¬ 
morphism r described above is the transposition (1, 2) since T^) = a 2 , 
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t(oc 2 ) = a 1? and t (a*) = oq for i > 3. What about the element aeG, 
which we mentioned above, which has order p? As an element of 
a has order p. But the only elements of order p in S p are /^-cycles. Thus <7 
must be a />-cycle. 

Therefore G, as a subgroup of S p , contains a transposition and a p- cycle. 
It is a relatively easy exercise (see Problem 4) to prove that any transposition 
and any p -cycle in S p generate S p . Thus a and t generate S p . But since 
they are in G, the group generated by a and t must be in G. The net result 
of this is that G = S p . In other words, the Galois group of q(x) over Q is 
indeed S p . This proves the theorem. 

The theorem gives us a fairly general criterion to get S p as a Galois group 
over Q. Now we must produce polynomials of degree p over the rationals 
which are irreducible over Q and have exactly two nonreal roots. To pro¬ 
duce irreducible polynomials, we use the Eisenstein criterion (Theorem 
3.10.2). To get all but two real roots one can play around with the co¬ 
efficients, but always staying in a context where the Eisenstein criterion is 
in force. 

We do it explicitly for p = 5. Let q(x) = 2x 5 — 10* + 5. By the 
Eisenstein criterion, q(x) is irreducible over Q. We graph y = q(x) = 
2x 5 — 10* + 5. By elementary calculus it has a maximum at x = — 1 
and a minimum at x = 1 (see Figure 5.8.1). As the graph clearly indicates, 


y 



y — q(x) = 2x 5 — lOx + 5 crosses the x-axis exactly three times, so q(x) 
has exactly three roots which are real. Hence the other two roots must be 
complex, nonreal numbers. Therefore q(x) satisfies the hypothesis of 
Theorem 5.8.1, in consequence of which the Galois group of q{x) over Q 
is S 5 . Using Theorem 5.7.2, we know that it is not possible to express the 
roots of q(x) in a combination of radicals of rational numbers. 
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Problems 

1. In S 5 show that (1 2) and (1 2 3 4 5) generate S 5 . 

2. In S 5 show that (12) and (1 3 2 4 5) generate S 5 . 

3. If p > 2 is a prime, show that (1 2) and (1 2 ■■■p- 1 p) generate S p . 

4. Prove that any transposition and /.-cycle in S p> p a prime, generate S„. 

5. Show that the following polynomials over Q are irreducible and have 
exactly two nonreal roots. 

(a) p(x) = * 3 - 3x - 3, 

(b) p(x) = x 5 - 6x + 3, 

(c) p(x) = x 5 + 5x 4 + 10a: 3 + 10a: 2 - x - 2. 

6. What are the Galois groups over Q of the polynomials in Problem 5? 

7. Construct a polynomial of degreee 7 with rational coefficients whose 
Galois group over Q is S 7 . 
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Linear Transformations 


In Chapter 4 we defined, for any two vector spaces F and W over the 
same field F, the set Horn (V, W) of all vector space homomorphisms 
of F into W. In fact, we introduced into Horn (F, W) the operations 
of addition and of multiplication by scalars (elements of F) in such a 
way that Horn (F, W) itself became a vector space over F. 

Of much greater interest is the special case V = W, for here, in 
addition to the vector space operations, we can introduce a multi¬ 
plication for any two elements under which Horn (F, F) becomes a 
ring. Blessed with this twin nature—that of a vector space and of a 
r i n g—Horn (F, F) acquires an extremely rich structure. It is this 
structure and its consequences that impart so much life and sparkle 
to the subject and which justify most fully the creation of the abstract 
concept of a vector space. 

Our main concern shall be concentrated on Horn (F, F) where F 
will not be an arbitrary vector space but rather will be restricted to be 
a finite-dimensional vector space over a field F. The finite- 
dimensionality of F imposes on Horn (F, F) the consequence that 
each of its elements satisfies a polynomial over F. This fact, perhaps 
more than any other, gives us a ready entry into Horn (F, F) and 
allows us to probe both deeply and effectively into its structure. 

The subject matter to be considered often goes under the name of 
linear algebra. It encompasses the isomorphic theory of matrices. The 
statement that its results are in constant everyday use in every aspect 
of mathematics (and elsewhere) is not in the least exaggerated. 
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A popular myth is that mathematicians revel in the inapplicability of 
their discipline and are disappointed when one of their results is “soiled” 
by use in the outside world. This is sheer nonsense! It is true that a mathe¬ 
matician does not depend for his value judgments on the applicability of a 
iven result outside of mathematics proper but relies, rather, on some 
trinsic, and at times intangible, mathematical criteria. However, it is 
qually true that the converse is false—the utility of a result has never 
wered its mathematical value. A perfect case in point is the subject of 
inear algebra; it is real mathematics, interesting and exciting on its own, 
yet it is probably that part of mathematics which finds the widest applica¬ 
tion—in physics, chemistry, economics, in fact in almost every science and 
pseudoscience. 


6.1 The Algebra of Linear Transformations 

Let V be a vector space over a field F and let Horn (V, V), as before, be 
the set of all vector-space-homomorphisms of V into itself. In Section 4.3 
we showed that Horn (V, V ) forms a vector space over F, where, for 
7j, T 2 e Horn (V, V), Tj + T 2 is defined by v{T x + T 2 ) = vT x + vT 2 
for all veV and where, for a 6 F, aTj is defined by v(a T t ) = a{vT t ). 

For Tj, T 2 eHom (V, V ), since vT t e V for any v e V, ( vT t )T 2 makes 
sense. As we have done for mappings of any set into itself, we define 
Tj T 2 by v(T 1 T 2 ) = ( vT x )T 2 for any v e V. We now claim that 7jT 2 e 
Horn (V, V). To prove this, we must show that for all a, jdeTand all 
u, v e V, (a u + fiv)(T x T 2 ) = a (m(T 1 T 2 )) + P(v(T l T 2 )). We compute 

(a u + fiv)(T x T 2 ) = ((ocm + fiv)T x )T 2 

= (a (uTJ + j B(vT t ))T 2 
= a {uT x )T 2 + j S{vT x )T 2 
= a (u(T x T 2 )) + j S{v{T x T 2 )). 

We leave as an exercise the following properties of this product in 
Hom (V, V): 

l (T. + T 2 ) T 3 = T t T,+ T 2 T 3 ; 

2. T 3 (f t + T 2 ) = T 3 T , + T 2 T 2 - 

3. T^T.T,) = (T,T 2 )T 3 ; 

:*■ <T t T 2 ) = ( a T t )T 2 = T t (zT 2 ); 

for all T u T 2 , T 3 e Hom (V, V) and all a e F. 

I Note that properties 1, 2, 3, above, are exactly what are required to 
*nake of Hom (V, V) an associative ring. Property 4 intertwines the 
haracter of Hom (V, V), as a vector space over F, with its character as a 
n g- 
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Note further that there is an element, I, in Horn (F, V), defined by 
vl = v for all v e F, with the property that 77 = IT = T for every T e 
Horn (F, F). Thereby, Horn (F, F) is a ring with a unit element. More¬ 
over, if in property 4 above we put T 2 = I, we obtain <xT x = T 1 (cd). 
Since (a I)T X = a {IT X ) = aT l5 we see that (a I)T X = ^(a/) for all 7\ e 
Horn (F, F), and so a/ commutes with every element of Horn (F, F). 
jAa// always write , tn the future , a I merely as a. 

DEFINITION An associative ring ^4 is called an over F if ^4 is a 

vector space over F such that for all a, b e .4 and ex. e F, <x{ab ) = ( (xa)b = 
a(ab). 

Homomorphisms, isomorphisms, ideals, etc., of algebras are defined as 
for rings with the additional proviso that these must preserve, or be in¬ 
variant under, the vector space structure. 

Our remarks above indicate that Horn (F, F) is an algebra over F. For 
convenience of notation we henceforth shall write Horn (F, F) as -4(F), 
whenever we want to emphasize the role of the field F we shall denote it by 

MV). 

DEFINITION A linear transformation on F, over F, is an element of A P (V). 

We shall, at times, refer to A(V) as the ring, or algebra, of linear trans¬ 
formations on V. 

For arbitrary algebras A, with unit element, over a field F, we can prove 
the analog of Cayley’s theorem for groups; namely, 

LEMMA 6.1.1 If A is an algebra, with unit element, over F, then A is isomorphic 
to a subalgebra of A(V) for some vector space V over F. 

Proof. Since A is an algebra over F, it must be a vector space over F. 
We shall use F = A to prove the theorem. 

If aeA, let T a \A A be defined by vT a = va for every v e A. We 
assert that T a is a linear transformation on F(=i). By the right-distribu¬ 
tive law + v 2 )T a = (v t + v 2 )a = v x a + v 2 a = v 1 T a + v 2 T a . Since A 
is an algebra, ( cxv)T a = (a v)a = a (va) = a (vT a ) for v e A, a e F. Thus 
T a is indeed a linear transformation on A. 

Consider the mapping i ]/:A -* A(V) defined by aij/ = T a for every 
a e A. We claim that p is an isomorphism of A into ^4(F). To begin with, 
if a, b e A and a, j 3 e F, then for all v e A, vT aa+ p b = v(xa + fib) — 
cx(va) + fi(vb) [by the left-distributive law and the fact that A is an algebra 
over F] = cx(vT a ) + fi(vT b ) = v(aT a + fiT b ) since both T a and T b are 
linear transformations. In consequence, T aa+pb = cxT a + fiT b , whence y 
is a vector-space homomorphism of A into A(V). Next, we compute, f° r 
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a, b e A, vT ab v(ab) — ( va)b — (vT a )T b = v(T a T b ) (we have used 
the associative law of A in this computation), which implies that T ab = 
T a T b . In this way, i (/ is also a ring-homomorphism of A. So far we have 
proved that p is a homomorphism of A, as an algebra, into A(V). All that 
remains is to determine the kernel of p. Let a g A be in the kernel of t j/\ 
- then a\p = 0, whence T a = 0 and so vT a = 0 for all v e V. Now V = A, 
'• A has a unit element, e, hence eT a = 0. However, 0 = eT a = ea = a] 
proving that a = 0. The kernel of i jj must therefore merely consist of 0, 
thus implying that i]/ is an isomorphism of A into vl(F). This completes the 
proof of the lemma. 

The lemma points out the universal role played by the particular algebras, 
A(V), for in these we can find isomorphic copies of any algebra. 

Let A be an algebra, with unit element e, over F, and let p(x) =a 0 + 
.*i# + - ■ + a n x n be a polynomial in F[#]. For a e A, by p(a), we shall 
mean the element ct 0 e + a t a + • • ■ + a n a n in A. If p(a) = 0 we shall say 
a satisfies p(x). 

LEMMA 6.1.2 Let A be an algebra, with unit element, over F, and suppose that 
A is of dimension m over F . Then every element in A satisfies some nontrivial poly¬ 
nomial in F |Y| of degree at most m. 

Proof. Let e be the unit element of A; if a e A, consider the m + 1 
elements e, a, a 2 ,..., a m in A. Since A is m-dimensional over F, by Lemma 
4.2.4, e, a, a , . . ., a m , being m + 1 in number, must be linearly dependent 
over F. In other words, there are elements a 0 , a l5 . . ., a n in F, not all 
0, such that o^e + a t a + • ■ ■ + a m a m = 0. But then a satisfies the non¬ 
trivial polynomial q(x) = a 0 + a. t x + • • • + a m x m , of degree at most Jh, 
in F|>]. 

If V is a finite-dimensional vector space over F, of dimension n, by 
Corollary 1 to Theorem 4.3.1, A(V) is of dimension n 2 over F. Since A(V) 
is an algebra over F, we can apply Lemma 6.1.2 to it to obtain that every 
element in A(V) satisfies a polynomial over F of degree at most n 2 . This 
fact will be of central significance in all that follows, so we single it out as 

If Vu an n-dimensional vector space over F, then, given any 
| element T in A(V), there exists a nontrivial polynomial q(x) eFT*! of deeree at 
W«ost n 2 , such that q(T) = 0. 

We shall see later that we can assert much more about the degree of q(x); 

ac *’ wc shah eventually be able to say that we can choose such a q{x) 

de gree at most n. This fact is a famous theorem in the subject, and is 
jjwiown as the Cayley-Hamilton theorem. For the moment we can get by 
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without any sharp estimate of the degree of q(x) ; all we need is that a 
suitable q(x) exists. 

Since for finite-dimensional V, given TeA(V), some polynomial q(x) 
exists for which q(T) = 0 , a nontrivial polynomial of lowest degree with 
this property, p(x), exists in F[*]. We call p(x) a minimal polynomial for T 
over F. If T satisfies a polynomial h(x), then p(x) \ h(x). 

DEFINITION An element Te A(V) is called right-invertible if there exists 
an S e A(V) such that TS = 1. (Here 1 denotes the unit element of A{V).) 

Similarly, we can define left-invertible, if there is a U e A{V) such 
that UT =1. If T is both right- and left-invertible and if TS = UT = 1, 
it is an easy exercise that S = U and that S is unique. 

DEFINITION An element T in A{V) is invertible or regular if it is both 
right- and left-invertible; that is, if there is an element S e A(V) such that 
ST = TS = 1. We write S as T~K 


An element in A(V) which is not regular is called singular. 

It is quite possible that an element in A(V) is right-invertible but is not 
invertible. An example of such: Let F be the field of real numbers and let 
V be F\x\, the set of all polynomials in x over F. In V let S be defined by 


9(x) s = ~ ?(*) 

ax 


and T by 


q(x)T 


% : 

Ji 


q{x) dx. 


Then ST ^ 1, whereas TS = 1. As we shall see in a moment, if V is 
finite-dimensional over F, then an element in A(V) which is right-invertible 
is invertible. 


THEOREM 6.1.2 If V is finite-dimensional over F, then Te A(V) is in¬ 
vertible if and only if the constant term of the minimal polynomial for T is not 0. 

Proof. Let p(x) = (Xq + oq* + • • ■ + a k x k , a k ^ 0, be the minimal 
polynomial for T over F. 

If a 0 7 ^ 0, since 0 = p(T) = a k T k + a k - 1 T k ~ 1 + • • • + a X T + a 0 , we 
obtain 

1 = 7Y - I + ••• + «,)) 
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Therefore, 

■s = - - KT k -' +•••+«,) 

«0 

acts as an inverse for T, whence T is invertible. 

Suppose, on the other hand, that T is invertible, yet a 0 = 0. Thus 
0 = + a 2 T 2 + • • • + a k T k = (oq + a 2 T + • - • + a k T k ~ x )T. Multi¬ 

plying this relation from the right by T ~ 1 yields + a 2 T + • • • + 
a k T k ~ k = 0, whereby T satisfies the polynomial q{x) = <x l + oc 2 x + • • • + 
a*** -1 in F[^]. Since the degree of q(x) is less than that of p(x), this is 
impossible. Consequently, Oq ^ 0 and the other half of the theorem is 
established. 

COROLLARY 1 If V is finite-dimensional over F and if TeA(V) is in¬ 
vertible, then T ~ 1 is a polynomial expression in T over F. 

Proof. Since T is invertible, by the theorem, a 0 + (Xj T + • • • + 
a k T k = 0 with Oq # 0. But then 

T - 1 = - — (a, + a 2 r + •■• + a^T^- 1 ). 

Oq 

COROLLARY 2 If V is finite-dimensional over F and if T e A(V) is singular, 
then there exists an S # 0 in A(V) such that ST = TS = 0. 

Proof. Because T is not regular, the constant term of its minimal 
polynomial must be 0. That is, p(x) = + • • • + a k x k , whence 0 = 

+ • * * + tx k T k . If S = a 1 + • • • + <x k T k then S ^ 0 (since 
Ofi + • • • + a k x k ~ 1 is of lower degree than p(x)) and ST = TS = 0. T . 

COROLLARY 3 If V is finite-dimensional over F and if Te A(V) is right- 
invertible, then it is invertible. 

Proof. Let TU =1. If T were singular, there would be an S ^ 0 
such that ST = 0. However, 0 = (ST)U = S(TU) = SI = S / 0, 
a contradiction. Thus T is regular. 

We wish to transfer the information contained in Theorem 6.1.2 and its 
corollaries from A(V) to the action of T on F. A most basic result in this 
vein is 

THEOREM 6.1.3 If V is finite-dimensional over F, then T e A(V) is singular 
if and only if there exists a v ^ 0 in V such that v T = 0. 

Proof. By Corollary 2 to Theorem 6.1.2, T is singular if and only if 
there is an S # 0 in A(V) such that ST = TS = 0. Since S ^ 0 there 
is an element w e V such that wS ^ 0. 
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Let v = wS; then vT = (wS)T = w(5T) = toO = 0. We have produced 
a nonzero vector v in V which is annihilated by T. Conversely, if vT — 0 
with v # 0, we leave as an exercise the fact that T is not invertible. 

We seek still another characterization of the singularity or regularity of 
a linear transformation in terms of its overall action on V. 

DEFINITION If TeA(V ), then the range of T, VT, is defined by VT = 
{vT \ v e V}. 

The range of T is easily shown to be a subvector space of V. It merely 
consists of all the images by T of the elements of V. Note that the range 
of T is all of V if and only if T is onto. 

THEOREM 6.1.4 If V is finite-dimensional over F, then T e A(V) is regular 
if and only if T maps V onto V. 

Proof. As happens so often, one-half of this is almost trivial; namely, 
if T is regular then, given v e V, v = ( vT~ l )T, whence VT — V and 
T is onto. 

On the other hand, suppose that T is not regular. We must show that 
T is not onto. Since T is singular, by Theorem 6.1.3, there exists a vector 
v t # 0 in V such that v t T = 0. By Lemma 4.2.5 we can fill out, from v u 
to a basis v u v 2 ,..., v„ of V. Then every element in VT is a linear com¬ 
bination of the elements = v x T, w 2 = v 2 T, .. ., w„ = v„T. Since 

w x = 0, VT is spanned by the n — 1 elements w 2 , . .., w n ; therefore 
dim VT < n - 1 < n = dim V. But then VT must be different from V ; 
that is, T is not onto. 

Theorem 6.1.4 points out that we can distinguish regular elements from 
singular ones, in the finite-dimensional case, according as their ranges are 
or are not all of V. If Te A{V) this can be rephrased as: T is regular if 
and only if dim {VT) = dim V. This suggests that we could use dim (VT) 
not only as a test for regularity, but even as a measure of the degree of 
singularity (or, lack of regularity) for a given T e A(V). 

DEFINITION If V is finite-dimensional over F, then the rank of T is the 
dimension of VT, the range of T, over F. 

We denote the rank of T by r( T). At one end of the spectrum, if r(T) = 
dim V, T is regular (and so, not at all singular). At the other end, if 
r(T) = 0, then T = 0 and so T is as singular as it can possibly be. The 
rank, as a function on A(V), is an important function, and we now investigate 
some of its properties. 
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LEMMA 6.1.3 IfV is finite-dimensional over F then for S, TeA(V). 

1. r(ST) < r(T); 

2. r(TS) < r(T); 

(and so, r(ST) < min {r(T), r(S)}) 

3. r(ST) = r(TS) = r(T) for S regular in A(V). 

Proof. We go through 1, 2, and 3 in order. 

1. Since VS <= V, V(ST) = (VS)T c VT, whence, by Lemma 4.2.6, 
dim (V(ST)) < dim VT; that is, r(ST) < r(T). 

2. Suppose that r(T) = m. Therefore, VT has a basis of m elements, 

u) u w m . But then (VT)S is spanned by w t S, w 2 S,..., w m S, hence 

has dimension at most m. Since r(TS) = dim (V(TS)) = dim ((VT)S) < 
m = dim VT = r(T ), part 2 is proved. 

3. If S is invertible then VS = V, whence V(ST) = (VS) T = VT. 
Thereby, r(ST) = dim (V(ST)) = dim (VT) = r(T). On the other hand| 
if VT has w 1} . . ., w m as a basis, the regularity of S implies that w t S, . :. , 
w mP are linearly independent. (Prove!) Since these span V(TS) they form 
a basis of V(TS). But then r(TS) = dim (V(TS)) = dim (VT) = r(T). 

COROLLARY IfT eA(V) andifSeA(V) is regular, then r(T) = r(STS~ x ). 
Proof. By part 3 of the lemma, r(STS~ ! ) = r(S( TS~ 1 )) = r ((TS~ 1 )S) = 


Problems 

In all problems, unless stated otherwise, V will denote a finite-dimensional 
vector space over a field F. 

1. Prove that S e A(V) is regular if and only if whenever v t ,...,v„e V 
are linearly independent, then v t S, v 2 S, ...,v n S are also linearly 
independent. 

2. Prove that TeA(V) is completely determined by its values on a 
basis of V. 

3. Prove Lemma 6.1.1 even when A does not have a unit element. 

4. If A is the field of complex numbers and F is the field of real numbers, 
then A is an algebra over F of dimension 2. For a = oc + fii in A, 
compute the action of T a (see Lemma 6.1.1) on a basis of A over F. 

5. If V is two-dimensional over F and A = A(V), write down a basis 
°f A over F and compute T a for each a in this basis. 

6. If dim F V > 1 prove that A(V) is not commutative. 

7. In A(V) let Z = {TeA(V)\ST= TS for all S e A(V)}. Prove that 
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Z merely consists of the multiples of the unit element of A(V) by the 
elements of F. 

*8. If dim F (F) > 1 prove that A(V) has no two-sided ideals other than 
(0) and A(V). 

**9. Prove that the conclusion of Problem 8 is false if V is not finite¬ 
dimensional over F. 

10. If V is an arbitrary vector space over F and if Te A(V) is both 
right- and left-invertible, prove that the right inverse and left inverse 
must be equal. From this, prove that the inverse of T is unique. 

11. If V is an arbitrary vector space over F and if T e A(V) is right- 
invertible with a unique right inverse, prove that T is invertible. 

12. Prove that the regular elements in A(V) form a group. 

13. If F is the field of integers modulo 2 and if V is two-dimensional over 
F, compute the group of regular elements in A(V) and prove that 
this group is isomorphic to S 3 , the symmetric group of degree 3. 

*14. If F is a finite field with q elements, compute the order of the group 
of regular elements in A(V) where V is two-dimensional over F. 

*15. Do Problem 14 if V is assumed to be n-dimensional over F. 

*16. If V is finite-dimensional, prove that every element in A(V) can be 
written as a sum of regular elements. 

17. An element E e A(V) is called an idempotent if E 2 = E. If EeA{V) 
is an idempotent, prove that V = V 0 © V t where v 0 E = 0 for all 
v 0 e V 0 and v x E = v x for all v x e V v 

18. If Te A F (V), F of characteristic not 2, satisfies T 3 = T, prove 
that V = V 0 © V t © V 2 where 

(a) v 0 e V 0 implies v 0 T =0. 

(b) v x e V t implies v t T = v x . 

(c) v 2 e V 2 implies v 2 T = —v 2 . 

*19. If V is finite-dimensional and T ^ 0 e A(V), prove that there is 
an S e A(V) such that E = TS # 0 is an idempotent. 

20. The element TeA(V) is called nilpotent if T m = 0 for some m. If 
T is nilpotent and if vT = av for some v ^ 0 in V, with a e F, prove 
that a = 0. 

21. If Te A(V) is nilpotent, prove that a 0 + ct 1 T + a 2 T 2 + ■ ■ ■ + 
a fc T k is regular, provided that a 0 ^ 0. 

22. If A is a finite-dimensional algebra over F and if. a e A, prove that 
for some integer k > 0 and some polynomial p{x) eF [a;], a — 
a k+1 p(a). 

23. Using the result of Problem 22, prove that for as A there is a poly¬ 
nomial q(x) E F[x\ such that a k = a 2k q{a). 
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24. Using the result of Problem 23, prove that given a e A either a is 
nilpotent or there is an element b ^ 0 in A of the form b = ah(a) 
where h(x) e F[*], such that b 2 = b. 

25. If A is an algebra over F (not necessarily finite-dimensional) and if 
for a e A, a 2 - a is nilpotent, prove that either a is nilpotent or there 
is an element b of the form b = ah{a) ^ 0, where h(x) e F\x\ such 
that b 2 = b. 

*26. If T ^ 0 g A(V) is singular, prove that there is an element SeA(V) 
such that TS = 0 but ST ^ 0. 

27. Let V be two-dimensional over F with basis v u v 2 . Suppose that 
TeA(V) is such that t>, T = at,, + fiv 2 , v 2 T = yv, + Sv 2 , where 

a, p,y, S e F. Find a nonzero polynomial in F\x\ of degree 2 satisfied 
by T, 

28. If V is three-dimensional over F with basis v lf v 2 , v 3 and if TeA(V) 

is such that v t T == + a i2 v 2 + a i3 v 3 for i = 1,2,3, with all 

®y e F, find a polynomial of degree 3 in satisfied by T. 

29. Let V be w-dimensional over F with a basis v u ..., v n . Suppose that 
T g A(V) is such that 

v i1 ~ v 2 > v iT — v 3) ..., v n _ x T = v n , 

V n T = -0C n V x - 0C n — \V 2 -- cc x v n , 

where (x x ,, a„ e F. Prove that T satisfies the polynomial 
P( x ) — x" + <x x x" 1 -f- (Z 2 ^ n 2 + • • • + tx„ over F. 

30. If TeA(V) satisfies a polynomial q(x) e F[x], prove that for 
A(V), S regular, STS~ 1 also satisfies q{x). 

31. (a) If F is the field of rational numbers and if V is three-dimensional 

over F with a basis v x , v 2 , v 3 , compute the rank of TeA(V) 
defined by 

v x T = v x - v 2 , 
v 2 T = v x + v 3 , 
v 3 T = v 2 + v 3 . 

(b) Find a vector v e V, v ^ 0. such that vT = 0. 

32. Prove that the range of T and U = {v <= V \ vT = 0} are subspaces 
of V. 

33. If TeA(V), let V 0 = {v e V\vT k = 0 for some A;}. Prove that 
V 0 is a subspace and that if vT m e V 0 , then v e V 0 . 

34. Prove that the minimal polynomial of T over F divides all polynomials 
satisfied by T over F. 

35. If n{T) is the dimension of the U of Problem 32 prove that r(T) + 
n(T ) = dim V. 
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6.2 Characteristic Roots 

For the rest of this chapter our interest will be limited to linear transfor¬ 
mations on finite-dimensional vector spaces. Thus, henceforth, V will always 
denote a finite-dimensional vector space over afield F. 

The algebra A(V) has a unit element; for ease of notation we shall write 
this as 1, and by the symbol A — T, for XeF, T e A(V) we shall mean 
XI - T. 

DEFINITION If TeA(V) then XeF is called a characteristic root (or 
eigenvalue ) of T if X — T is singular. 

We wish to characterize the property of being a characteristic root in the 
behavior of T on V. We do this in 

THEOREM 6.2.1 The element XeF is a characteristic root of T e A(V) if 
and only if for some v 0 in V, vT — Xv. 

Proof. If A is a characteristic root of T then X — T is singular, whence, 
by Theorem 6.1.3, there is a vector v # 0 in V such that v{X - T) = 0. 
But then Xv = vT. 

On the other hand, if vT - Xv for some v =A 0 in V, then v(X — T) = 0, 
whence, again by Theorem 6.1.3, X — T must be singular, and so, A is a 
characteristic root of T. 

LEMMA 6.2.1 If XeF is a characteristic root of T e A(V), then for any 
polynomial q{x) e F[x], q(X) is a characteristic root of q{T). 

Proof. Suppose that A e F is a characteristic root of T. By Theorem 
6.2.1, there is a nonzero vector v in V such that vT = Xv. What about vT ? 

Now vT 2 = (Xv)T = X{vT) — X(Xv) = X 2 v. Continuing in this way, 
we obtain that vT k = X k v for all positive integers k. If q{x) = OqX™ + 
a l xf"~ 1 + • • • + a m , a t e F, then q{T) = OqT” 1 + a 1 T m ~ 1 + • • • + a m , 
whence vq(T) = v(a 0 T m + a 1 T m 1 + ■ • ■ + a m ) = a 0 (p7’ m ) + afvT ) + 

• • • + a m v = (a 0 A m + a,A m ~ 1 + • • • + a> = q{X)v by the remark made 
above. Thus v{q{X) - q{T)) = 0, hence, by Theorem 6.2.1, q{X) is a 
characteristic root of q{T). 

As immediate consequence of Lemma 6.2.1, in fact as a mere special 
case (but an extremely important one), we have 

THEOREM 6.2.2 If XeF is a characteristic root of TeA(V), then X is a 
root of the minimal polynomial of T. In particular, T only has a finite number of 
characteristic roots in F. 
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Proof. Let/>(*) be the minimal polynomial over F of T; thus p(T) = 0. 
If X g F is a characteristic root of T, there is a v ^ 0 in V with vT = Xv. 
As in the proof of Lemma 6.2.1, vp{T) = p{X)v\ but p(T) = 0, which 
thus implies that p(X)v = 0. Since v ^ 0, by the properties of a vector 
space, we must have that p{X) = 0. Therefore, X is a root of p(x). Since 
p(x) has only a finite number of roots (in fact, since deg p(x) < n 2 where 
n = dim F V, p(x) has at most n 2 roots) in F, there can only be a finite 
number of characteristic roots of Tin F. 

If Te A(V) and if£ e A(V) is regular, then ( STS~ 1 ) 2 = STS~ 1 STS~ 1 = 
ST 2 S~ l , ( STS~ l ) 2 = ST 3 S~ l ,..., (STS -1 ) 1 = ST'S~ l . Consequently, 
for any ?(*)gF[*], q(STS x ) = Sq(T)S~ l . In particular, if q(T) = 0, 
then q(STS ) = 0. Thus if p(x) is the minimal polynomial for T, then it 
follows easily that p(x) is also the minimal polynomial for STS~ l . We have 
proved 

LEMMA 6.2.2 If T, S e A(V) and if S is regular , then T and STS~ l have 
the same minimal polynomial. 

DEFINITION The element 0 ^ v e V is called a characteristic vector of T 
belonging to the characteristic root XeF\ivT= Xv. 

What relation, if any, must exist between characteristic vectors of T 
belonging to different characteristic roots? This is answered in 

THEOREM 6.2.3 If X t ,..., X k in F are distinct characteristic roots of Te 
A(V) and if v x ,... ,v k are characteristic vectors of T belonging to X u ... ^X k , 
respectively , then v l ,. .., v k are linearly independent over F. 

Proof. For the theorem to require any proof, k must be larger than 1; 
so we suppose that k > 1. 

If y l5 .. ., v k are linearly dependent over F, then there is a relation of the 
form ajOj + • • • + a k v k = 0, where a l5 ..,, a k are all in F and not all of 
them are 0. In all such relations, there is one having as few nonzero co¬ 
efficients as possible. By suitably renumbering the vectors, we can assume 
this shortest relation to be ^ 

M + ' • • + Pfj = 0, jL 0,. . ., p. ^ 0. (1) 

; We know that v t T = X t v h so, applying T to equation (1), we obtain 

i 'ti/Vi + • • • + XjpjVj = 0. (2) 

Multiplying equation (1) by X x and subtracting from equation (2), we 
obtain 

( X 2 — X l )P 2 v 2 + ' • • + (Xj — Xi)fijVj = 0. 
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Now # 0 for i > 1 , and j9; ^ 0, whence (A s — ^ 0. But 

then we have produced a shorter relation than that in (1) between v 1 . 
v 2 , • ■ •, v k . This contradiction proves the theorem. 

COROLLARY 1 If Te A(V) and if dim f V = n then T can have at most 
n distinct characteristic roots in F. 

Proof. Any set of linearly independent vectors in V can have at most n 
elements. Since any set of distinct characteristic roots of T, by Theorem 
6.2.3, gives rise to a corresponding set of linearly independent characteristic 
vectors, the corollary follows. 

COROLLARY 2 If T e A(V) and if dim f V = n, and if T has n distinct 
characteristic roots in F, then there is a basis of V over F which consists of characteristic 
vectors of T. 

We leave the proof of this corollary to the reader. Corollary 2 is but the 
first of a whole class of theorems to come which will specify for us that a 
given linear transformation has a certain desirable basis of the vector space 
on which its action is easily describable. 


Problems 

In all the problems V is a vector space over F. 

1. If Te A(V) and if q(x ) ef[x] is such that q(T) — 0, is it true that 
every root of q(x) in F is a characteristic root of T? Either prove that 
this is true or give an example to show that it is false. 

2. If Te A(V) and if p(x) is the minimal polynomial for T over F, sup¬ 
pose that p(x) has all its roots in F. Prove that every root of p{x) is a 
characteristic root of T. 

3. Let V be two-dimensional over the field F, of real numbers, with a 
basis v l} v 2 . Find the characteristic roots and corresponding charac¬ 
teristic vectors for T defined by 

(a) v k T = v y + v 2 , v 2 T = v 1 - v 2 . 

(b) v x T = 5^ + 6z> 2 , v 2 T — —7v 2 . 

(c) v^T = v y + 2z> 2 , v 2 T = 3^ + 6v 2 . 

4. Let V be as in Problem 3, and suppose that T e A(V) is such that 
v l T = av 1 + / 3v 2 , v 2 T = yv l + dv 2 , where a, /?, y, S are in F. 

(a) Find necessary and sufficient conditions that 0 be a characteristic 
root of T in terms of a, /?, y, S. 
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(b) In terms of a, ft y, d find necessary and sufficient conditions that 
1 have two distinct characteristic roots in F. 

5 ' ^nt, t T dime T ional -7 er a field F prove that ever y ^ 

i A{ V) satisfies a polynomial of degree 2 over F. 

I * 6 - ^I « two-dimensional over F and if S, T e A(F), prove that 
r [bi - JS) commutes with all elements of A(V). 

7. Prove Corollary 2 to Theorem 6.2.3. 

8. If V is a-dimensional over F and Te A(V) is nilpotent (i.e., r* = 0 
for some k) prove that 7- = 0. {Hint: If „ 6 V use the fact that v, vT, 
VI . ,vl must be linearly dependent over F.) 


6.3 Matrices 

Although we have been discussing linear transformations for some-time it 
has always been m a detached and impersonal way; to us a linear trans¬ 
formation has been a symbol (very often T) which acts in a certain way on 
a vector space. When one gets right down to it, outside of the few concrete 
examples encountered in the problems, we have really never come face to 
face with specific linear transformations. At the same time it is clear that 
d one were to pursue the subject further there would often arise the need 
^f making a thorough and detailed study of a given linear transformation 

'land T pr ° blem ’ presented a linear transformation 

(mad suppose, for the moment, tha, we have a means of recognizing it), 

in a “ practicai ” and computabie 

representaf '“f fir i“ “ 3 Simp ' e n °‘ ati ° n ’ or > perha P s more accurately, 
|*P esentation, for linear transformations. We shall accomplish this by 

linear tr Pa “‘ CU " bas,s of the vector space and by use of the action of a 
toear transformation on this basis. Once this much is achieved, by means 

^„g° P ofT nS m Tl ^ indUCC ° Perati ° nS f ° r the created, 

of it5 5 wn Cm “ a g , eb ”' Thls new ob i ect > infused with an algebraic life 
self Th-’ ai 'i 6 studled as a mathematical entity having an interest by 
• This study is what comprises the subject of matrix theory. 

et ^ S r rCe of these matrices ’ that is, to investigate the 

mild h th S mdependentl y of what the y represent, can be costly, for we 
all alw tHr °™ g awa y a § reat deal of useful information. Instead we 

le ma‘ri,“ , h* C ; ntC u P,ay be r tWeen the abstract ’ A W. and the concrete, 
Let 1 / u g j* t0 obtain ^formation one about the other, 
e a h, ii-dimensional vector space over a field F and let „ 

an a! °[ ^ TS A(V) then ris determined on any vector as" 

we know its action on a basis of V. Since T maps V into V, T, 
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7 ,„ T v T must all be in V. As elements of V, each of these is realizable 
in a unique way as a linear combination of v x ,..., v n over F. Thus 

V X T — OL lx V x + Ct x2 V 2 + • ’ - + «l n v n 
V 2 T = & 2X V X + 0^22^2 T " * * T &2tPn 
V\T — 0li X V x T T ^irPn 


v„T = oc„ x v x + a n2 v 2 + • • • + a„„y„, 

where each a ( J - e F. This system of equations can be written more compactly as 

n 

V-T = ^ a ij v p for * == 1, 2, , n. 

j= i 

The ordered set of n 2 numbers a y in F completely describes T. They will 
serve as the means of representing T. 

DEFINITION Let V be an w-dimensioned vector space over F and let 
be a basis for V over F. If Te A(V) then the matrix of T in the 
basis v x ,..., v n , written as m(T), is 

( an a 12 a ln 

■■■ 

a„i a„ 2 ■ * oc nn 

where v t T = Sj a ij v j- 

A matrix then is an ordered, square array of elements of F, with, as yet, 
no further properties, which represents the effect of a linear transformation 
on a given basis. 

Let us examine an example. Let F be a field and let V be the set of all 
polynomials in x of degree n — 1 or less over F. On V let D be defined 
by (0 O + p x x + • • • + l )D = Pi + 2/? 2 * + ’' ’ + ipi x * 1 + ' ‘+ 

(n _ l)p n _ x x"~ 2 . It is trivial that D is a linear transformation on V; in 
fact, it is merely the differentiation operator. 

What is the matrix of D? The questions is meaningless unless we specify 
a basis of V. Let us first compute the matrix of D in the basis v x = 1, 
V 2 = x, v 3 = x 2 , ..., Vf = x l ~ \ . . ., v„ = x n ~ l . Now, 

v x D = ID = 0 = Ozq + 0z> 2 + • ’ • + 0v„ 
v 2 D — xD = 1 = lzq + 0v 2 + • • • + 0v n 

Vi D = x i ~ l D = (i - I)**- 2 

= Opj + 0#2 + * * ■ T 0v(_ 2 + (i f)^t —i T 

+ * • • + 0v„ 

v„D = x" _1 Z> = (n — l)x" -2 

= Opj + 0v 2 + • • ■ + 0y„_ 2 + (w l)y„-i + 0v n . 
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lOoing back to the very definition of the matrix of a linear transformation 
|in a given basis, we see the matrix of D in the basis v 1} . . . , v n , m x {D), is 
fact 


' 

\j 

0 

0 

... 

0 

o\ 

0 

2 

0 

0 

0 

\° 

0 

3 

0 

0 / 

\° 

0 

0 

... (n - 1) 

0 / 


| However, there is nothing special about the basis we just used, or in how 
|we numbered its elements. Suppose we merely renumber the elements of 
|this basis; we then get an equally good basis w x = x" -1 , w 2 = x n ~ 2 i ..., 
1 * 0 * = x?~\ ... , = 1. What is the matrix of the same linear trans- 

gormation D in this basis? Now, 7 

w x D = x n ~ l D = (n — l)*" -2 

= + (n — l)w 2 + 0w 3 + • • • + Ouj„ 


| w t D = x" l D = (n — i)x n 1 1 

+ • • • + 0w t + (n — i)w i + 1 + 0ze; i+2 + 

w n D = ID — 0 = Oz^i + Oii> 2 + • • • + 0 w n , 

whence m 2 (D), the matrix of D in this basis is 


+ Ow* 


/o (» - i; 
1 0 0 


m 2 (D) = 


0 

(» - 2) 
0 


0 

0 

(» - 3) 



4 Before leaving this example, let us compute the matrix of D in still another 
| basis of V over F. Let u x = 1, u 2 = 1 + x, u 3 = 1 + x 2 ,. .., w„ = 1 + x"~ 1 ; 
|:*t is easy to verify that u x ,... i u n form a basis of V over F. What is the 
^matrix of D in this basis ? Since 

*1-0 = \D = 0 = Ozq + Ow 2 + • • • + 0 u n 

*2-^ = (1 + x)D = 1 = lwj + Ow 2 + • • • + 0 u n 

= (1 + X 2 )D = 2x = 2 (« 2 — Mi) = -2 Mj + 2 m 2 + 0w 3 + • • • + 0 u n 

*«D = (1 + x n ~ 1 )D = (n - l)*”" 2 = (n - 1 )(«„ - u x ) 

= -(» - 1)«1 + Om 2 + • • * + 0m„_ 2 + (» - !)«__! + Ou„. 
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The matrix, m 3 (D), of D in this basis is 



By the example worked out we see that the matrices of D, for the three 
bases used, depended completely on the basis. Although different from each 
other, they still represent the same linear transformation, D, and we could 
reconstruct D from any of them if we knew the basis used in their determi¬ 
nation. However, although different, we might expect that some relationship 
must hold between m^D), m 2 (D), and m 3 (D). This exact relationship will 
be determined later. 

Since the basis used at any time is completely at our disposal, given a 
linear transformation T (whose definition, after all, does not depend on any 
basis) it is natural for us to seek a basis in which the matrix of T has a 
particularly nice form. For instance, if T is a linear transformation on V, 
which is ^-dimensional over F, and if T has n distinct characteristic roots 
X u . . ., X n in F, then by Corollary 2 to Thebrem 6.2.3 we can find a basis 
v l} .. ., v n of V over F such that v t T = In this basis T has as matrix 

the especially simple matrix, 



We have seen that once a basis of V is picked, to every linear transforma¬ 
tion we can associate a matrix. Conversely, having picked a fixed basis 
of F over F, a given matrix 




a ij e F, 


gives rise to a linear transformation T defined on V by v t T = £; a ij v j on 
this basis. Notice that the matrix of the linear transformation T, just con¬ 
structed, in the basis v t ,.. ., v n is exactly the matrix with which we started- 
Thus every possible square array serves as the matrix of some linear trans¬ 
formation in the basis zq, . . ., v n . 
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It is clear what is intended by the phrase the first row, second row,. . . 
of a matrix, and likewise by the first column, second column,.... In the 


matrix 


4b 1 


the element Ofo- is in the ith row and >th column; we refer to it as the (i j) 
entry of the matrix. 

To write out the whole square array of a matrix is somewhat awkward- 
instead we shall always write a matrix as (<x y ); this indicates that the (i, j) 
entry of the matrix is a 0 -. 

Suppose that V is an ^-dimensional vector space over F and v x , . . ., v 
is a basis of V over F which will remain fixed in the following discussion" 
Suppose that S and T are linear transformations on V over F having matrices 
m ( S ) = m{T) = (t ij), respectively, in the given basis. Our objective 

IS to transfer the algebraic structure of A(V) to the set of matrices having 
entries in F. 6 

To begin with, S = T if and only if vS = vT for any z; e V, hence, if 
and only if v t S = v ( T for any v u ... ,v n forming a basis of V over F. 
Equivalently, S = T if and only if a~ x tJ for each i andj. 

Given that m(S ) = (a tJ ) and m(T) = (x u ), can we explicitly write down 
m(S + T)? Because m(S) = (cr v ), of = a^oy likewise, v t T = Y. r r v., 
whence J J J 

v,(S +T )~ v,S + v,T = £ S t (/ , = £ (o,j + x u ),, 

1 J J 

But then, by what is meant by the matrix of a linear transformation in a 
given basis, m(S + T) = {X tj ) where k tJ = a tj + x u for every i and j. 
A computation of the same kind shows that for y e F, m(yS )-*= 
where n tJ = ya^ for every i and j. 

The most interesting, and complicated, computation is that of m(ST) 
Now v '' 

VIST) = { vS)T = ^ T = £ <’u(v i T). 

However, v k T = x kJ oy substituting in the above formula yields 
v^SF) = °ik ^ = 2 (Z) a ^k)jvj. 

^rove!) Therefore, m(ST) = (v y ), where for each i and j, v- = 
±k O ik T k , 7 lJ 



78 


Linear Transformations Ch. 6 


At first glance the rule for computing the matrix of the product of two 
linear transformations in a given basis seems complicated. However, note 
that the (i,j) entry of m{ST) is obtained as follows: Consider the rows of 
$ as vectors and the columns of T as vectors; then the (i, j) entry of m(ST) 
is merely the dot product of the z'th row of S with the jth column of T. 

Let us illustrate this with an example. Suppose that 

(\ 2 " 

m(S) = 


and 


m(T) 


3 4 

-1 0 
2 3 


the dot product of the first row of S with the first column of T is (1)(- 1) + 
(2) (2) = 3, whence the (1,1) entry of m(ST ) is 3; the dot product of the 
first row of S with the second column of T is (1)(0) + (2) (3) = 6, whence 
the (1, 2) entry of m{ST) is 6; the dot product of the second row of S with 
the first column of T is (3) (— 1) + (4) (2) = 5, whence the (2, 1) entry of 
m(ST) is 5; and, finally the dot product of the second row of S with the 
second column of T is (3)(0) + (4) (3) = 12, whence the (2,2) entry of 
M(ST) is 12. Thus 

(C T \ ft V 

m(Sr)= ( 5 12 

The previous discussion has been intended to serve primarily as a motiva¬ 
tion for the constructions we are about to make. 

Let F be a field; an n X n matrix over F will be a square array of elements 

in F, 


*..1 


l n2 


(which we write as (a y )). Let F n = {(a y ) | a y e F}; in F n we want to 
introduce the notion of equality of its elements, an addition, scalar multipli¬ 
cation by elements of F and a multiplication so that it becomes an algebra 
over F. We use the properties of m(T) for T e A(V) as our guide in this. 


1. We declare (a tj ) = for two matrices in F„, if and only if a y - 

j Bij for each i and j. 

2. We define (a tj ) + (jBy) = (A y ) where A y = a y + for every i, j. 

3. We define, for y e F, y(a y ) = (/i y ) where n tj = ya u for every i and j- 

4. We define (a y )(j8 y ) = (v y ), where, for every i and j, v y = Tk a ikPkj- 

Let V be an n-dimensional vector space over F and let o l5 . • • , v n be a 
basis of V over F ; the matrix, m(T), in the basis v x ,... ,v n associates wit 
Te A(V) an element, m(T), in F„. Without further ado we claim that the 


Sec. 6.3 Matrices 


279 


mapping from A(V) into F n defined by mapping Tonto m{T) is an algebra 
isomorphism of A(F) onto F n . Because of this isomorphism, F n is an 
associative algebra over F (as can also be verified directly). We call F n 
the algebra of all n x n matrices over F. 

Every basis of V provides us with an algebra isomorphism of A(V) onto 
F n . It is a theorem that every algebra isomorphism of A (V) onto F is so 
obtainable. 

In light of the very specific nature of the isomorphism between A(V) and 
F n > we shall often identify a linear transformation with its matrix, in some 
basis, and A(V) with F n . In fact, F n can be considered as A(V) acting on 
the vector space V = F M of all n-tuples over F, where for the basis Vl = 
(1,0,..., 0), » 2 = (0, 1,0,...,0),..., *„ = (0,0,...,0,1), (« tj ) e F„ 
acts as »,.(a tJ ) = ith row of (a y ). 

We summarize what has been done in 

THEOREM 6.3.1 The set of all n x n matrices over F form an associative 
algebra, F n , over F. If V is an n-dimensional vector space over F, then A(V) and 
F n are isomorphic as algebras over F. Given any basis v 1} . . ., v n of V over F, if 
for TeA(F), m(T) is the matrix of T in the basis jtr t , .... v n , the mapping 
T -* m(T) provides an algebra isomorphism of A(V) onto F n . 

The zero under addition in F n is the zero-matrix all of whose entries are 0; 
we shall often write it merely as 0. The unit matrix, which is the unit element 
^ ^n under multiplication, is the matrix whose diagonal entries are 1 and 
whose entries elsewhere are 0; we shall write it as I, /„ (when we wish to 
emphasize the size of matrices), or merely as 1. For ole F, the matrices 


a I = 


{blank spaces indicate only 0 entries) are called scalar matrices. Because of the 
isomorphism between A(V) and F n , it is clear that TeA(V) is invertible 
if and only if m(T), as a matrix, has an inverse in F n . 

Given a linear transformation TeA(V), if we pick two bases, v l ,...,v n 
and w l ,. ..., w n of V over F, each gives rise to a matrix, namely, mfT) and 
m 2 {T), the matrices of T in the bases v l ,... ,v n and w l} ..., w n , respec¬ 
tively. As matrices, that is, as elements of the matrix algebra F n , what is 
the relationship between mfT) and m 2 {T)? 

THEOREM 6.3.2 If V is n-dimensional over F and if T e A(F) has the ma- 
I tn * m i( r ) in the basis v } ,. . ., v n and the matrix m 2 {T) in the basis w t ,. . ., w n 
°f F over F, then there is an element C e F n such that m 2 {T) = CmfT)C~ l . 
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In fact, if S is the linear transformation of V defined by v t S - wjor i - \,2,...,n, 
then C can be chosen to be m l (S). 

Proof. Let mfT) = (a y ) and m 2 (T) = (fiy); thus v t T = a i} v p 

Vet S~\ be^the linear transformation on 7 defined by v t S = w { . Since 
Vl . 1 and . .., are bases of 7 over F, S maps 7 onto 7, hence, 

by Theorem 6.1.4, S is invertible in .4(7). . . 

Now w,T = Ell since w < = v ‘ s ’ on substitut ‘ n « * 1S m 
pression for «,T we obtain (pfi)T-EiWp- ,® ut the " ^ I 
(Y B v )S’ since S is invertible, this further simplifies to o ) 

YBv By the very definition of the matrix of a linear transformation in 
a given basis. m^STS '>) = (*„) = m 2 (T). However, the n tapping 
T m, (T) is an isomorphism of A( V) onto F n ; therefore, m^S 1S ) 
m 1 (S)m 1 {T)m l (S~ ‘) = Putting the pieces together 

we obtain m,(T) = m 1 (5)m I (7’)m I (5)- 1 , which is exactly what is claimed 

in the theorem. 

We illustrate this last theorem with the example of the matrix of D, in 
various bases, worked out earlier. To minimize the computation, suppose 
that 7 is the vector space of all polynomials over F of degree 3 or less, and let 
D be the differentiation operator defined by («o + «i* + ct 2 x + a 3 x )D - 
+ 2a 2 * + 3a 3 x 2 


L i 1 — V J" ' . _ 2 

As we saw earlier, in the basis v x = 1, v 2 = x, v 3 — x , 
matrix of D is 


v A = x' 


the 


mfD) = 


^0 0 0 ON 
10 0 0 
0 2 0 0 
i0 0 3 0i 


In the basis u x 
of D is 


= 1 , u 2 — 1 + *, M 3 — 1 T x , M 4 


( 0 0 0 0 \ 

1 0 0 ol 
-2 2 0 or 

-3 0 3 0/ 


1 + x 3 , the matrix 


Let S be the linear transformation of 7 defined by v t S — w t { »i)> 
v 2 S = w 2 = 1 + x = v t + » 2 , o 3 5 = w 3 = 1 + x 2 = Mi + o 3 , and a so 

0 5 ^ = 1 + x 3 = Vi + y4> The matrix of S in the basis v t , v 2 , o 3 , 


is 



0 0 0 \ 

1 0 0 \ 
0 1 0 I 

0 0 l/ 
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A simple computation shows that 


Then 


( 1 0 0 

-1 1 0 

-1 0 1 

-10 0 



Cm 1 (D)C~ 1 


( 1 0 0 0 \ /0 0 0 0 \ 

1 1 0 0|/1 0 0 o\ 

1 0 1 0 II 0 2 0 0 I 

1 0 0 1/ \0 0 3 0/ 


( 0 0 0 0\ 
1 0 0 0 1 
-22001 
-3 0 3 0/ 


m 2 {D), 


1 0 0 0 \ 
1 10 0 1 
1 0 1 0 I 
10 0 1 / 


as it should be, according to the theorem. (Verify all the computations 
used!) 

The theorem asserts that, knowing the matrix of a linear transformation 
in any one basis allows us to compute it in any other, as long as we know the 
linear transformation (or matrix) of the change of basis. 

We still have not answered the question: Given a linear transformation, 
how does one compute its characteristic roots? This will come later. From 
the matrix of a linear transformation we shall show how to construct a 
polynomial whose roots are precisely the characteristic roots of the linear 
transformation. 


Problems 


1. Compute the following matrix products: 



2 

3 \/ 

1 

0 

1\ 

1 

2 

0 

2 

3 

4 

5/\ 

-1 

-1 

-1/ 


2. Verify all the computations made in the example illustrating Theorem 
6.3.2. 
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3. In F n prove directly, using the definitions of sum and product, that 

(a) A(B + C) = AB + AC; 

(b) (AB)C = A{BC); 
for A, B, C e F n . 

4. In F 2 prove that for any two elements A and B, (AB — BA) is a 
scalar matrix. 

5. Let V be the vector space of polynomials of degree 3 or less over F. 
In V define T by (a 0 + a t x + ct 2 x 2 + a 3 x 3 )T = a 0 + ct^x + 1) + 
a2 (* + i) 2 + a 3 (* + l) 3 . Compute the matrix of T in the basis 

(a) 1, x, x , x . 

(b) 1, 1 + *, 1 + * 2 , 1 + * • , . 

(c) If the matrix in part (a) is A and that in part (b) is B, find a 

matrix C so that B = CAC 1 . 

6. Let V = F (3) and suppose that 

1 1 2 
-1 2 1 
0 1 3 


is the matrix of TeA(V) in the basis v x = (1,0,0), v 2 = (0, 1,0), 
z> 3 = (0, 0, 1). Find the matrix of T in the basis 

(a) = (1, 1, 1), u 2 — (0, 1, 1), m 3 = (0, 0, 1). 

(b) = (1, 1, 0), u 2 = (1, 2, 0), m 3 = (1, 2, 1). 

7. Prove that, given the matrix 

/° 1 0\ 

^4= 0 0 1 I e F 3 

\6 -11 6 / 


(where the characteristic of F is not 2), then 

(a) A 3 - 6A 2 + UA - 6 = 0. 

(b) There exists a matrix C e F 2 such that 

/! 0 0 \ 

CAC' 1 =0 2 0 . 

\0 0 3/ 


Prove that it is impossible to find a matrix C E F 2 such that 




for any a, j 3 e F. 

i * fF 

9. A matrix A e F n is said to be a diagonal matrix if all the entries on 
the main diagonal of A are 0, i.e., if A = (a £j -) and ay = 0 for i J- 
If A is a diagonal matrix all of whose entries on the main diagonal 
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are distinct, find all the matrices B e F n which commute with A, that is, 
all matrices B such that BA = AB. 

10. Using the result of Problem 9, prove that the only matrices in F n 
which commute with all matrices in F n are the scalar matrices. 

11. Let A e F n be the matrix 

1 0 1 0 0 ... 0 o\ 

0 0 1 0 ... 0 0 

. 0 0 0 1 . . / 0 0 
A = : : : ’ 

. 6 o o o ... 6 i 

\0 0 0 0 ... 0 0 

whose entries everywhere, except on the superdiagonal, are 0, and 
whose entries on the superdiagonal are l’s. Proved" = 0 butyl" -1 ^ 0. 
*12. If A is as in Problem 11, find all matrices in F n which commute with 
A and show that they must be of the form a 0 + vl^A + a. 2 A 2 + ' ' * + 
a n - 1 A n ~ 1 where a 0 , a l5 . . ., a n _ l e F. 

13. Let AeF 2 and let C(A) = {B e F 2 \ AB = BA). Let C(C(A)) = 
{G e F 2 | GX = XG for all Ze C(A)}. Prove that if G e C(C(A )) then 
G is of the form a 0 + a 1 A, a 0 , oq e F. 

14. Do Problem 13 for A e F 3 , proving that every GeC(C(A)) is of 
the form a 0 + ol x A + cc 2 A 2 . 

15. In F n let the matrices E t j be defined as follows: E t j is the matrix 
whose only nonzero entry is the (i, j) entry, which is 1. Prove 

(a) The E t j form a basis of F n over F. 

(b) E i} E kl = 0 for j # k; E^E^ = E u . 

(c) Given i,j, there exists a matrix C such that CE U C ~ 1 = E j. 

(d) If i # j there exists a matrix C such that CE^C ~ 1 = E l2 . 

(e) Find all B e F n commuting with E l 2 . 

(f) Find all B e F n commuting with E n . 

16. Let F be the field of real numbers and let C be the field of complex 
numbers. For a e C let T a :C -> C by xT a = xa for all x e C. Using 
the basis 1, i find the matrix of the linear transformation T a and so get 
an isomorphic representation of the complex numbers as 2 x 2 
matrices over the real field. 

17. Let Q be the division ring of quaternions over the real field. Using 
the basis 1, i,j, k of Q over F, proceed as in Problem 16 to find an 
isomorphic representation of Q by 4 x 4 matrices over the field of 
real numbers. 

*18. Combine the results of Problems 16 and 17 to find an isomorphic 
representation of Q as 2 x 2 matrices over the field of complex 
numbers. 
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19. Let Jt be the set of all n x n matrices having entries 0 and 1 in such 
a way that there is one 1 in each row and column. (Such matrices 
are called permutation matrices .) 

(a) If M e Jt, describe AM in terms of the rows and columns of A. 

(b) If MeJt, describe MA in terms of the rows and columns of A. 

20. Let Jt be as in Problem 19. Prove 

(a) Jt has n\ elements. 

(b) If M e Jt, then it is invertible and its inverse is again in Jt . 

(c) Give the explicit form of the inverse of M. 

(d) Prove that Jt is a group under matrix multiplication. 

(e) Prove that Jt is isomorphic, as a group, to S n , the symmetric 
group of degree n. 

21. Let A = (a y ) be such that for each i, Yj a £j - = 1. Prove that 1 is 
a characteristic root of A (that is, 1 — A is not invertible). 

22. Let A = ( a tj ) be such that for every j, Yi ay = Prove that 1 is 
a characteristic root of A. 

23. Find necessary and sufficient conditions on a, f$, y, <5, so that 
A — \ i is invertible. When it is invertible, write down A 1 

Vv *) 


explicitly. 

24. If EeF„ is such that F 2 = F # 0 prove that there is a matrix 
C e F„ such that » 

/ 1 0 ... 0 I 0 ... 0\ 

In i n I \ 


CEC- 1 = 




1 

o .. 

.. 0 

0 .. 

. o' 

0 

i . 

.. 0 



0 

o . 

.. i 

o .. 

. . o 

o" 


.. 0 

o .. 

.. 0 

0 


.. 0 

o .. 

.. 0 


where the unit matrix in the top left corner is r x r, where r is the 


rank of E. 

25. If F is the real field, prove that it is impossible to find matrices 
A, B e F n such that AB — BA = 1. 

26. If Fis of characteristic 2, prove that in F 2 it is possible to find matrices 


A, B such that AB — BA = 1. 

27. The matrix A is called triangular if all the entries above the main 
diagonal are 0. (If all the entries below the main diagonal are 0 the 
matrix is also called triangular). 

(a) If A is triangular and no entry on the main diagonal is 0, prove 
that A is invertible. 

(b) If A is triangular and an entry on the main diagonal is 0, prove 
that A is singular. 
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28. If A is triangular, prove that its characteristic roots are precisely the 
elements on its main diagonal. 

29. If N = 0, N g F n , prove that 1 + N is invertible and find its inverse 
as a polynomial in N. 

30. If A e F n is triangular and all the entries on its main diagonal are 0, 
prove that A " = 0. 

31. If A g F n is triangular and all the entries on its main diagonal are 
equal to a 0 e F, find A -1 . 

32. Let S, T be linear transformations on V such that the matrix of S 
in one basis is equal to the matrix of T in another. Prove there exists 
a linear transformation A on V such that T = ASA~ l . 

6.4 Canonical Forms: Triangular Form 

Let V be an w-dimensional vector space over a field F. 

DEFINITION The linear transformations S, TeA(V) are said to be 
similar if there exists an invertible element C e A(V) such that T = CSC -1 . 

In view of the results of Section 6.3, this definition translates into one 
about matrices. In fact, since F„ acts as A(V) on F (n) , the above definition 
already defines similarity of matrices. By it, A, B e F n are similar if there 
is an invertible C e F n such that B = CAC~ 1 . 

The relation on A(V) defined by similarity is an equivalence relation; 
the equivalence class of an element will be called its similarity class. Given 
two linear transformations, how can we determine whether or not they are 
similar? Of course, we could scan the similarity class of one of these toTsee 
if the other is in it, but this procedure is not a feasible one. Instead we try 
to establish some kind of landmark in each similarity class and a way of 
going from any element in the class to this landmark. We shall prove the 
existence of linear transformations in each similarity class whose matrix, 
in some basis, is of a particularly nice form. These matrices will be called 
the canonical forms. To determine if two linear transformations are similar, 
we need but compute a particular canonical form for each and check if 
these are the same. 

There are many possible canonical forms; we shall only consider three of 
these, namely, the triangular form, Jordan form, and the rational canonical 
form, in this and the next three sections. 

DEFINITION The subspace W of V is invariant under T e A(V) if 

WT a w. 

LEMMA 6 . 4.1 If IV cz V is invariant under T, then T induces a linear 
transformation T on VjW, defined by (y + W)T = vT + W. If T satisfies 
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the polynomial q(x) e F[x], then so does T. If pfx) is the minimal polynomial 
for T over F and if p{x) is that for T, then pfx) j p{x). 

Proof. Let V = VIW; the elements of V are, of course, the cosets 
v .j. W of W in V. Given v = v + W e V define vT = vT + W. To 
verify that T has all the formal properties of a linear transformation on V 
is an easy matter once it has been established that T is well defined on V. We 
thus content ourselves with proving this fact. 

Suppose that v = v 1 +W=v 2 +W where v lt v 2 e V. We must show 
that v x T + W = v 2 T + W. Since v 1 + W = v 2 + W, v t - v 2 must be 
in W, and since W is invariant under T, (z^ — v 2 ) T must also be in W. 
Consequently vfT — v 2 T e W, from which it follows that vfT + W = 
v 2 T + W, as desired. We now know that T defines a linear transformation 
on V = VIW. 

If v = v + W e V, then v(T 2 ) = vT 2 + W = (vT)T + W = 
(vT + W)T = ((» + W)T)T = v(T) 2 ; thus (f 2 ) = (f) 2 . Similarly, 
(T*) = (T) fe for any k > 0. Consequently, for any polynomial q{x) e 
F[#], q(T) = q{T). For any q(x) with q(T) = 0, since D is the 

zero transformation on V, 0 = q(T) = q(T). 

Let pi(x) be the minimal polynomial over F satisfied by T. If q(T) =0 
for q(x) e F[x], then pfx) | q(x). If p{x) is the minimal polynomial for T 
over F, then p(T) = 0, whence p(T) = 0; in consequence, pfx) \ p(x). 

As we saw in Theorem 6.2.2, all the characteristic roots of T which lie 
in F are roots of the minimal polynomial of T over F. We say that all the 
characteristic roots of T are in F if all the roots of the minimal polynomial of T 
over F lie in F. 

In Problem 27 at the end of the last section, we defined a matrix as being 
triangular if all its entries above the main diagonal were 0. Equivalently, if 
T is a linear transformation on V over F, the matrix of T in the basis 
v u . . . , v n is triangular if 

»i T = a u v i 

V 2 T = 0t21^1 + &22 V 2 

v { T = + ot i 2 v 2 + • • • + XiiVi, 

v„T = (X^ + • • • + (X m „v n , 

i.e., if v t T is a linear combination only of v i and its predecessors in the basis. 

THEOREM 6.4.1 If Te A{V) has all its characteristic roots in F, then there 
is a basis of V in which the matrix of T is triangular. 

Proof. The proof goes by induction on the dimension of V over F. 

If dim f V = 1, then every element in A(V) is a scalar, and so the 
theorem is true here. 
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Suppose that the theorem is true for all vector spaces over F of dimension 
n — 1, and let V be of dimension n over F. 

The linear transformation T on V has all its characteristic roots in F', 
let e F be a characteristic root of T. There exists a nonzero vector v x 
in V such that v l T = X x v x . Let W = {clv x | a e W is a one-dimensional 
subspace of V, and is invariant under T. Let V = V/W; by Lemma 4.2.6, 
dim V = dim V — dim W = n — 1. By Lemma 6.4.1, T induces a 
linear transformation T on V whose minimal polynomial over F divides 
the minimal polynomial of T over F. Thus all the roots of the minimal 
polynomial of T, being roots of the minimal polynomial of T, must lie in F. 
The linear transformation T in its action on V satisfies the hypothesis of 
the theorem; since V is (n — 1)-dimensional over F , by our induction 
hypothesis, there is a basis v 2 , v 3 ,,.., v„ of V over F such that 

v 2 T = <x 22 v 2 

^3 T = a 3 2^2 + a 33 ^ 3 

cc i2 v 2 + a i3 v 3 + • • • + a u Vi 

a n2^2 + a n3^3 + * * ’ + 

Let v 2 ,.. ., v n be elements of V mapping into v 2 ,.. ., v n , respectively. 
Then v x , v 2 ,..., v n form a basis of V (see Problem 3, end of this section). 
Since v 2 T = a 22 v 2 , v 2 T — <x 22 v 2 = 0, whence v 2 T — <x 22 v 2 must be in W. 
Thus v 2 T — <x 22 v 2 is a multiple of v Xi say a 2l v x , yielding, after transposing, 
v 2 T = a 21 »! + <x 22 v 2 . Similarly, v t T - <x i2 v 2 - <x i3 v 3 - • • • - a6 W, 
whence v t T = a,^ + 2 v 2 + • ■ • + a i{ »|. The basis v l} .. ., v n of V over 
F provides us with a basis where every v t T is a linear combination of v t 
and its predecessors in the basis. Therefore, the matrix of T in this basis 
is triangular. This completes the induction and proves the theorem. 

We wish to restate Theorem 6.4.1 for matrices. Suppose that the matrix 
■A £ F n has all its characteristic roots in F. A defines a linear transforma¬ 
tion T on F (n) whose matrix in the basis 

= (1,0,..., 0), v 2 = (0, 1, 0,..., 0),.. ., z> B = (0, 0,..., 0, 1), 

ts precisely A. The characteristic roots of T, being equal to those of A, are 
all in F, whence by Theorem 6.4.1, there is a basis of F (n) in which the 
matrix of T is triangular. However, by Theorem 6.3.2, this change of basis 
merely changes the matrix of T, namely A, in the first basis, into CAC~ X 
for a suitable C c= F n . Thus 

ALTERNATIVE FORM OF THEOREM 6.4.1 If the matrix A e F n has 

oil its characteristic roots in F, then there is a matrix C e F n such that CAC~ l is 
a triangular matrix. 
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Theorem 6.4.1 (in either form) is usually described by saying that T 
(or A) can be brought to triangular form over F. 

If we glance back at Problem 28 at the end of Section 6.3, we see that 
after T has been brought to triangular form, the elements on the main 
diagonal of its matrix play the following significant role: they are precisely 
the characteristic roots of T. 

We conclude the section with 

THEOREM 6.4.2 If V is n-dimensional over F and if T e A(V) has all its 
characteristic roots in F, then T satisfies a polynomial of degree n over F. 

Proof. By Theorem 6.4.1, we can find a basis v^, . . ., v n of V over F 
such that: 

v l T = X{v^ 

V 2 T — $21^1 + X 2 v 2 

v x T = a il v l + • • • + + XiV h 

for i — 1, 2, .. ., n. 

Equivalently 

vfT- X,) = 0 
v 2 {T - X 2 ) = a 2i v x 

v t (T — = a il v 1 + • • • + 

for i = 1,2 

What is v 2 (T - X 2 ){T - XJ? As a result of v 2 (T - X 2 ) = and 

vfT — = 0, we obtain v 2 (T — X 2 ){T — X t ) = 0. Since 

(T — X 2 )(T - X,) = (T - X x ){T- A 2 ), 
vfT — X 2 )(T — = v x (T — X t )(T — X 2 ) = 0. 

Continuing this type of computation yields 

vfT - X t )(T — A i _ 1 )---(T- AJ = 0, 
v 2 (T - Xi)(T — A,_ 1 )-*-(7’- X x ) = 0, 

Vi (T- Xi)(T — A f _i) • • • (T — A0 = 6. 

For i = n, the matrix S = (T — X n )(T — A„_!) ■ • • (T — satisfies 
v S = v 2 S = • • • = vjS = 0. Then, since S annihilates a basis of V, S must 
annihilate all of V. Therefore, S = 0. Consequently, T satisfies the poly¬ 
nomial (* - X t ){x - A 2 ) • • • {x - X n ) in F[x] of degree n, proving the 

theorem. 

Unfortunately, it is in the nature of things that not every linear trans¬ 
formation on a vector space over every field F has all its characteristic roots 
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in F. This depends totally on the field F. For instance, if F is the field of 
real numbers, then the minimal equation of 



over F is x 2 + 1, which has no roots in F. Thus we have no right to assume 
that characteristic roots always lie in the field in question. However, we 
may ask, can we slightly enlarge F to a new field K so that everything works 
all right over K ? 

The discussion will be made for matrices; it could be carried out equally 
well for linear transformations. What would be needed would be the follow¬ 
ing: given a vector space V over a field F of dimension n, and given an 
extension K of F, then we can embed V into a vector space V K over K of 
dimension n over K. One way of doing this would be to take a basis v 1} ..., 
v n of V over F and to consider V K as the set of all ct 1 v 1 + • • • + ct n v n with 
the oq 6 K, considering the v t linearly independent over K. This heavy use 
of a basis is unaesthetic; the whole thing can be done in a basis-free way 
by introducing the concept of tensor product of vector spaces. We shall not 
do it here; instead we argue with matrices (which is effectively the route 
outlined above using a fixed basis of V). 

Consider the algebra F n . If K is any extension field of F, then F n c: K n 
the set of n x n matrices over K. Thus any matrix over F can be considered 
as a matrix over K. If T e F n has the minimal polynomial p{x) over F, 
considered as an element of K n it might conceivably satisfy a different 
polynomial p 0 (x) over K. But then p 0 (x ) | p(x), since p 0 (x) divides all 
polynomials over K (and hence all polynomials over F) which are satisfied 
by T. We now specialize K. By Theorem 5.3.2 there is a finite extension, 
K, of F in which the minimal polynomial, p(x), for T over F has all its roots. 
As an element of K n , for this K, does T have all its characteristic roots in 
K ? As an element of K n , the minimal polynomial for T over K, p 0 (x) 
divides p{x ) so all the roots of p 0 {x) are roots of p{x) and therefore lie in K. 
Consequently, as an element in K n , T has all its characteristic roots in K. 

Thus, given T in F n , by going to the splitting field, K, of its minimal 
polynomial we achieve the situation where the hypotheses of Theorems 6.4.1 
and 6.4.2 are satisfied, not over F, but over K. Therefore, for instance, T 
can be brought to triangular form over K and satisfies a polynomial of 
degree n over K. Sometimes, when luck is with us, knowing that a certain 
result is true over K we can “cut back” to F and know that the result is still 
true over F. However, going to K is no panacea for there are frequent 
situations when the result for K implies nothing for F. This is why we have 
two types of “canonical form” theorems, those which assume that all the 
characteristic roots of T lie in F and those which do not. 

A final word; if T e F n , by the phrase “a characteristic root of T” we shall 
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mean an element X in the splitting field K of the minimal polynomial 
p(x) of T over F such that X — T is not invertible in K n . It is a fact (see 
Problem 5) that every root of the minimal polynomial of T over F is a 
characteristic root of T. 


Problems 

1. Prove that the relation of similarity is an equivalence relation in A{V). 

2. If T e F n and if K zd F, prove that as an element of K n , T is in¬ 
vertible if and only if it is already invertible in F n . 

3. In the proof of Theorem 6.4.1 prove that v x , .. ., v n is a basis of V. 

4. Give a proof, using matrix computations, that if A is a triangular 
n x n matrix with entries X x , . . . , X n on the diagonal, then 

(A - X x )(A - X 2 )--‘(A - XJ = 0. 

*5. If T e F n has minimal polynomial p{x) over F, prove that every 
root of p(x), in its splitting field K, is a characteristic root of T. 

6. If T e A(V) and if X e F is a characteristic root of T in F, let U x = 
{veV\ vT = /U>}. If Se A(V) commutes with T, prove that U x 
is invariant under S. 

*7. If Jt is a commutative set of elements in A(V) such that every 
M e Jt has all its characteristic roots in F, prove that there is a 
C e A(V) such that every CMC~ 1 , for M e Jt, is in triangular form. 

8. Let W be a subspace of V invariant under T e A(V). By restricting 

T to W, T induces a linear transformation T (defined by a;T = 
wT for every we W). Let p(x) be the minimal polynomial of T 
over F. _ 

(a) Prove that/)(#) \p(x), the minimal polynomial of T over F. 

(b) If T induces Ton VjW satisfying the minimal polynomial^*) 
over F, prove that p(x) \p{x)p{x). 

*(c) If/»(*) and p{x) are relatively prime, prove that p(x) = p{x)p(x). 
*(d) Give an example of a T for which p(x) ^ p{x)p(x)- 

9. Let Jt be a nonempty set of elements in A(V); the subspace W c: V 
is said to be invariant under Jt if for every M e Jt , WM c W. If 
W is invariant under Jt and is of dimension r over F, prove that there 
exists a basis of V over F such that every M e Jt has a matrix, in 
this basis, of the form 


M x 

0 ' 

m X2 

m 2 


where M x is an r x r matrix and M 2 is an (n — r) x (n — r ) matrix. 
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10. In Problem 9 prove that M x is the matrix of the linear transformation 
M induced by M on W, and that M 2 is the matrix of the linear trans¬ 
formation M induced by Mon V/W. 

*11. The nonempty set, Ji , of linear transformations in A(V) is called an 
irreducible set if the only subspaces of V invariant under Ji are (0) 
and V. If Ji is an irreducible set of linear transformations on V and if 


D = { TeA(V) | TM = MT for all Mel}, 


prove that D is a division ring. 

*12. Do Problem 11 by using the result (Schur’s lemma) of Problem 14, 
end of Chapter 4, page 206. 

*13. If F is such that all elements in A(V) have all their characteristic 
roots in F, prove that the D of Problem 11 consists only of scalars. 

14. Let F be the field of real numbers and let 

0 1 ' 

1 0, 

(a) Prove that the set Ji consisting only of 

0 r 

~\ 0 

is an irreducible set. 

(b) Find the set D of all matrices commuting with 


e F,. 



and prove that D is isomorphic to the field of complex numbers. 

15. Let F be the field of real numbers. 

(a) Prove that the set 

0 1 0 0 \ / 0 0 0 1 

-10 0 0| / 0 010 

0 0 0 1/1 o —i o o 

0 0-10/ \-l 0 0 0 

is an irreducible set. 

(b) Find all A e F 4 such that AM = MA for all M e Ji. 

(c) Prove that the set of all A in part (b) is a division ring isomorphic 
to the division ring of quaternions over the real field. 

16. A set of linear transformations, Ji cz A(V), is called decomposable 
if there is a subspace W cz V such that V = W © W t , W ^ (0), 
W ^ V, and each of W and W x is invariant under Ji. If Ji is not 
decomposable, it is called indecomposable. 
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(a) If M is a decomposable set of linear transformations on V, prove 
that there is a basis of V in which every M £ has a matrix 
of the form 


'M 1 

0 ' 

0 

m 2 


where M x and M 2 are square matrices. 

(b) If V is an n-dimensional vector space over F and if Te A(V) 
satisfies T n = 0 but T" -1 ^ 0, prove that the set {T} (con¬ 
sisting of T) is indecomposable. 

17. Let TE A{V) and suppose that p(x) is the minimal polynomial for 
T over F. 

(a) If p(x) is divisible by two distinct irreducible polynomials p t (x) 
and p 2 (x) in F[x], prove that {T} is decomposable. 

(b) If {T}, for some T e A(V) is indecomposable, prove that the 
minimal polynomial for T over F is the power of an irreducible 
polynomial. 

18. If TeA(V) is nilpotent, prove that T can be brought to triangular 
form over F, and in that form all the elements on the diagonal are 0. 

19. If TeA(V) has only 0 as a characteristic root, prove that T is nil- 
potent. 

6.5 Canonical Forms: Nilpotent Transformations 

One class of linear transformations which have all their characteristic roots 
in F is the class of nilpotent ones, for their characteristic roots are all 0, 
hence are in F. Therefore by the result of the previous section a nilpotent 
linear transformation can always be brought to triangular form over F. 
For some purposes this is not sharp enough, and as we shall soon see, a 
great deal more can be said. 

Although the class of nilpotent linear transformations is a rather re¬ 
stricted one, it nevertheless merits study for its own sake. More important 
for our purposes, once we have found a good canonical form for these we 
can readily find a good canonical form for all linear transformations which 
have all their characteristic roots in F. 

A word about the line of attack that we shall follow is in order. We 
could study these matters from a “ground-up” approach or we could invoke 
results about the decomposition of modules which we obtained in Chapter 4. 
We have decided on a compromise between the two; we treat the material 
in this section and the next (on Jordan forms) independently of the notion 
of a module and the results about modules developed in Chapter 4. How¬ 
ever, in the section dealing with the rational canonical form we shall com¬ 
pletely change point of view, introducing via a given linear transformation 
a module structure on the vector spaces under discussion; making use of 
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Theorem 4.5.1 we shall then get a decomposition of a vector space, and the 
resulting canonical form, relative to a given linear transformation. 

Even though we do not use a module theoretic approach now, the reader 
should note the similarity between the arguments used in proving Theorem 
4.5.1 and those used to prove Lemma 6.5.4. 

Before concentrating our efforts on nilpotent linear transformations we 
prove a result of interest which holds for arbitrary ones. 

LEMMA 6.5.1 If V — V t © V 2 © • • •© V k , where each subspace V l is of 
dimension n t and is invariant under T, an element of A(V), then a basis of V can 
be found so that the matrix of T in this basis is of the form 

<A X 0 ... 0 


0 


0 


^0 0 .... 

where each A i is an n i x n i matrix and is the matrix of the linear transformation 
induced by T on 

Proof. Choose a basis of V as follows: vf l) ,. . ., v ni (1) is a basis of V 1} 
\ v i' \ • s v rt 2 ^ a basis of V 2 , and so on. Since each F) is invariant 


(0 0 CO 


(i) 


under T, v^T e F £ so is a linear combination of v r 
and of only these. Thus the matrix of T in the basis so chosen is of "the 
desired form. That each A t is the matrix of T i} the linear transformation 
induced on F £ by T, is clear from the very definition of the matrix of a 
linear transformation. 

We now narrow our attention to nilpotent linear transformations. 

LEMMA 6.5.2 If T e A(V) is nilpotent, then a 0 + a fT + ••• + a m T m , 
where the a £ e F, is invertible if (Xq # 0. 

Proof. If S is nilpotent and a 0 # 0 e F, a simple computation shows that 


(«o + S) I - 


+ 


+ ••• + (-1) 


C' 

r- 1 ° 


= 1, 


«o a 0 

ff ^ = 0- Now if T r = 0, S = a t T + cl 2 T 2 + • • • + a m T m also must 
satisfy S r = 0. (Prove!) Thus for a 0 # 0 in F, a 0 + S is invertible. 

Notation. M t will denote the t x t matrix 


1 0 
0 1 


0 0 
0 0 


o\ 

0 

1 

0 / 


pH of whose entries are 0 except on the superdiagonal, where they are all l’s. 
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DEFINITION If TeA(V ) is nilpotent, then k is called the index of nil 
potence of T if T k = 0 but T k ~ 1 # 0. 

The key result about nilpotent linear transformations is 

THEOREM 6.5.1 If T e A(V) is nilpotent, of index of nilpotence n u then a 
basis of V can be found such that the matrix of T in this basis has the form 



where n t > n 2 > • • ■ > n r and where n t + n 2 + ’ * • + n r = dim*. V. 

Proof. The proof will be a little detailed, so as we proceed we shall 
separate parts of it out as lemmas. 

Since T" 1 =0 but T" 1_1 #0, we can find a vector veV such that 
vT "i -1 # 0. We claim that the vectors v, vT ,. . ., vT ni ~ 1 are linearly 
independent over F. For, suppose that cl^v + cn 2 vT + ■ • • + a„ i yT" 1 1 = 0 
where the a f e F; let a 5 be the first nonzero a, hence 

vT s ~ 1 (ct s + a s + 1 T + ••• + a „T"^ S ) = 0. 

Since a s # 0, by Lemma 6.5.2, a s + a S + 1 T + • • • + a „T" l ~ s is invertible, 
and therefore vT s ~ 1 = 0. However, 5 < n u thus this contradicts that 
vT n I - 1 # 0. Thus no such nonzero a s exists and v, vT, . . . , vT" 1 1 have 
been shown to be linearly independent over F. 

Let V t be the subspace of V spanned by v t = v, v 2 = vT,. . ., v ni = 
vT" l ~ 1 ; V l is invariant under T, and, in the basis above, the linear trans¬ 
formation induced by T on V 1 has as matrix M ni . 

So far we have produced the upper left-hand corner of the matrix of the 
theorem. We must somehow produce the rest of this matrix. 

LEMMA 6.5.3 IfueV 1 is such that uT n '~ k = 0, where 0 < k < n u then 
u = u 0 T k for some u 0 e V v 

Proof. Since u e V l , u = ol^v + ct 2 vT + • • • + ci k vT k 1 + ei k + 1 vT k + 
•••+ a ni vT ni ~ x . Thus 0 = uT ni ~ k = a 1 vT n '- k + • • • + a k vT ni ' 1 - 
However, vT ni ~ k , . . ., vT tti ~ 1 are linearly independent over F, whence 
= a 2 = • • • = a k = 0, and so, u = a k + l vT k + ■ ■ • + a n vT ni ~ l = u 0 T , 
where u 0 = a k + 1 v + • • • + a ni vT ni ~ k 1 e V t . 

The argument, so far, has been fairly straightforward. Now it becomes 
a little sticky. 
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LEMMA 6.5.4 There exists a subspace W of V, invariant under T, such that 

V = v t © w. 


Proof. Let W be a subspace of V, of largest possible dimension, such that 


1. V 1 n W = (0); 

2. W is invariant under T. 


We want to show that V = F, + W. Suppose not; then there exists an 
element zeV such that 4: # V, + W. Since T" = 0, there exists an in- 
l teger k, 0 < k < n x , such that zT k e V 1 + W and such that zT l £ V 1 + W 
for i < k. Thus zT k = u + w, where u e V 1 and where we W. But then 

® ~ ^ ' = u T ni k + wT ni ~ k ; however, since both V 1 

and W are invariant under T, uT n '~ k e V, and wT ni ~ k 6 W. Now, since 

Vl n W _7 this leads to uT "'~ k = -wT n '~ k eV, nW = (0), resulting 
in uT-~ = 0. By Lemma 6.5.3, « = u 0 T k for some u 0 eV i; therefore, 
zT -u + w = u 0 T + w. Let z t = z - u 0 ; then zfT k = zT k - u 0 T k = 
w eW, and since W is invariant under T this yields z 1 T m e W for all 
m > k. On the other hand, if i < k, zfT 1 = zT l - UqT 1 $ V x + W, for 
otherwise zT must fall in V l + W, contradicting the choice of k. 

Let W y be the subspace of V spanned by W and z u ZyT,. . ., z T k ~ l . 
Since ^ W , and since W x 3 W, the dimension of W x must be larger than 
that of W. Moreover, since z x T k e W and since W is invariant under T, 
W x must be invariant under T. By the maximal nature of W there must 
be an element of the form w 0 + oCyZy + T + • • • + a k ZyT k ~ 1 # 0 in 
w , n Vy, where ^ e W. Not all of a l5 ..., a, can be 0; otherwise we 
would have 0 ^ w 0 e W n V t = (0), a contradiction. Let a s be the first 
nonzero a; then w 0 + ZyT s ~fa s + a s+l T + ■■■ + a k T k ~ s ) e V v Since 
a s * 0, by Lemma 6.5.2, a s + a S + 1 T + • • • + a k T k ~° is invertible and its 
inverse, R, is a polynomial in T. Thus W and V t are invariant under R- 
however, from the above, w 0 R + z x T s ~ 1 g VyR c Vy, forcing ZyT s ~' e 
Vy + WR c: Vy + W. Since j - 1 < k this is impossible; therefore 

Vy + W — V. Because Vy n W = (0), V = Vy © W, and the lemma is 
proved. 


The hard work, for the moment, is over; we now complete the proof of 
theorem 6.5.1. 


Lemma 6 ' 5,4, V = Vy © tV where W is invariant under T. Using 
e basis v ± ,. . ., v ni of Vy and any basis of W as a basis of V, by Lemma 
D,3 -l» the matrix of T in this basis has the form 


1 


here A 2 is the matrix of T 2 , the linear transformation induced on W by T. 
mce r-i = 0, T 2 " 2 = 0 for some < n v Repeating the argument used 
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for T on V for T 2 on W we can decompose W as we did V (or, invoke an 
induction on the dimension of the vector space involved). Continuing this 
way, we get a basis of V in which the matrix of T is of the form 



That n l + n 2 + • • • + n r = dim V is clear, since the size of the matrix is 
n X n where n = dim V. 

DEFINITION The integers n 1} n 2 ,. . ., n r are called the invariants of T. 

DEFINITION If TeA(V) is nilpotent, the subspace M of V, of dimen¬ 
sion m, which is invariant under T, is called cyclic with respect to T if 

1. MT m = (0), MT m ~ 1 # (0); 

2. there is an element ze M such that z, zT,, zT m ~ 1 form a basis of M. 
(Note: Condition 1 is actually implied by Condition 2). 

LEMMA 6.5.5 If M, of dimension m, is cyclic with respect to T, then the 
dimension of MT k is m — k for all k < m. 

Proof. A basis of MT k is provided us by taking the image of any basis of 
M under T k . Using the basis z, zT, . . . , zT m ~ 1 of M leads to a basis zT k , 
zT k+ 1 , . . ., zT m ~ 1 of MT k . Since this basis has m — k elements, the 
lemma is proved. 

Theorem 6.5.1 tells us that given a nilpotent T in A(V) we can find 
integers n x > n 2 > • • • > n r and subspaces, V lt . . . , V r of V cyclic with 
respect to T and of dimensions n Xi n 2 ,..., n r , respectively such that 

V = V\ ® ‘ ‘ ® V r . 

Is it possible that we can find other integers m l > m 2 > • • * > m s and 
subspaces U 1 , . . ., U s of V, cyclic with respect to T and of dimensions 
m x ,..., m s , respectively, such that V = © • • • © U s ? We claim that 

we cannot, or in other words that s = r and m x = n 1} m 2 = n 2 ,. . ., m r = 
n r . Suppose that this were not the case; then there is a first integer i such 
that m x ^ n ( . We may assume that m x < n { . 

Consider VT mi . On one hand, since V = V x © • • • © V r , VT mi = 
V l T mi ©••■© V{T mi ©•••© V r T mi . Since dim VfT mi = n x - m x , 
dim V 2 T m ‘ — n 2 — m t , . .., dim = n x — m x (by Lemma 6.5.5), 

dim VT m > ( n l — m x ) + ( n 2 — m x ) + ••• + (n x — mf On the other 
hand, since V = U l ® • • • © U s and since U)T mi = (0) for j > i, VT m ‘ = 
UfT mi © U 2 T m + • • • © U i _ l T mi . Thus 

dim VT m ‘ = (m t — m t ) + (m 2 — m x ) + • • • + ("*,-i — m x ). 
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By our choice of i, n 1 = m u n 2 = m 2 , 


n i -1 = m i _ 1 , whence 


dim VT m ‘ = (n, - m .) + (n 2 - m { ) + ■ • • + _ m .). 

However, this contradicts the fact proved above that dim VT mi > 
{ n i — m i) + • • • + («,-_! — m t ) + ( n i — mf since n t — > 0 . 

Thus there is a unique set of integers n 1 > n 2 > • • • > Ur such that V is 
the direct sum of subspaces, cyclic with respect to T of dimensions n u 
n 2 ,... , n r . Equivalently , we have shown that the invariants of T are unique. 
Matricially, the argument just carried out has proved that if n t > n 2 > 
> n r and m l > m 2 > • • • > m s , then the matrices 



are similar only if r = s and n t = m u n 2 = m 2 ,.. ., n r 
So far we have proved the more difficult half of 


THEOREM 6.5.2 Two nilpotent linear transformations are similar if and only 
if they have the same invariants. 

. Proot ■ The discussion preceding the theorem has proved that if the two 
nilpotent linear transformations have different invariants, then they can¬ 
not be similar, for their respective matrices 


M n. • 

.. 0 \ 

(M m . 

/ m i 

.. 0 \ 


: 

: 

and | • 

: 


\o 


\o 

■■ 

r- 


cannot be similar. 

In the other direction, if the two nilpotent linear transformations S and T 
have the same invariants n, > • ■ - > n r , by Theorem 6.5.1 there are bases 
• • • 3 v n and w x ,.. ., w n of V such that the matrix of S in v t , ..., v n and 
that of T in w t ,. .., w n , are each equal to 

/ M n , 0 \ 

lo mJ 

But if A is the linear transformation defined on V by v t A = w h then S = 

ATA 1 (Prove! Compare with Problem 32 at the end of Section 6 . 3 ), 
Whence S and T are similar. 

Let us compute an example. Let 

T=(o 0 oj e F 3 

\0 0 0 / 
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act on .F (3) with basis u x = (1,0,0), u 2 — (0, 1,0), u 3 — (0,0, 1). Let 
Vl = u x , v 2 = u x T = u 2 + » 3 , 0 3 = w 3 ; in the basis » l9 0 2 , 0 3 the matrix 

of T is 


/O 

1 

0 N 

o 

0 

0 

\0 

0 

o> 


so that the invariants of T are 2, 1. If A is the matrix of the change of 
basis, namely 

f. ' 

\0 0 1 / 

a simple computation shows that 

(0 1 0 \ 

ATA" 1 = 0 0 01. 

\0 0 0 / 


One final remark: the invariants of T determine a partition of n, the 
dimension of V. Conversely, any partition of n, n x > n r , + 

n r = n, determines the invariants of the nilpotent linear 

transformation. 

( M "‘ 0 
\o ... m 

Thus the number of distinct similarity classes of nilpotent n x n matrices is precisely 
p{n), the number of partitions of n. 



6.6 Canonical Forms: A Decomposition of V : Jordan Form 

Let V be a finite-dimensional vector space over F and let T be an arbitrary 
element in A F (V). Suppose that V t is a subspace of V invariant under T. 
Therefore T induces a linear transformation T t on V x defined by uT x = 
uT for every u E V x . Given any polynomial q(x) E claim that 

the linear transformation induced by q(T) on V x is precisely q{T x ). (The 
proof of this is left as an exercise.) In particular, if q{T) = 0 then q{T x ) = 
0. Thus T x satisfies any polynomial satisfied by T over F. What can be 
said in the opposite direction? 

LEMMA 6.6.1 Suppose that V = V x © V 2 , where V x and V 2 are subspaces 
of V invariant under T. Let T x and T 2 be the linear transformations induced by 
Ton V x and V 2 , respectively. If the minimal polynomial of T x over F is pfx) while 
that of T 2 is p 2 {x), then the minimal polynomial for T over F is the least common 
multiple of p x (x) and p 2 (x ). 
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Proof. If 'p(x) is the minimal polynomial for T over F, as we have seen 
above, bothXTj) and p(T 2 ) are zero, whence pfx) | p(x) and p 2 (x) \p{x). 
But then the least common multiple of pfx) and p 2 (x) must also divide p(x). 

On the other hand, if q(x) is the least common multiple of pfx) and 
p 2 (x), consider q(T). For Vl e V 1} since pfx) | q{x), v iq {T) = v x q{T x ) = 0; 
similarly, for v 2 e V 2 , v 2 q(T) = 0. Given any v e V, v can be written as 
v = + v 2 > where e V 1 and v 2 e V 2 , in consequence of which vq(T) — 
( v i + = v x q{T") + v 2 q(T) = 0. Thus q(T) = 0 and T satisfies 

q(x). Combined with the result of the first paragraph, this yields the lemma. 

COROLLARY If V = V i @ • • • ® V k where each V\ is invariant under T 
and if pi(x) is the minimal polynomial over F of T t , the linear transformation induced 
by T on V 0 then the minimal polynomial of T over F is the least common multiple 
of Pi(x), p 2 (x ),. . . , p k (x). 

We leave the proof of the corollary to the reader. 

Let T e A F (V) and suppose that p(x) in F[x] is the minimal polynomial 

T over I\ By Lemma 3.9.5, we can factor p(x) in F [x] in a unique way 
as P( x ) = <h { x ) ll( l 2 ( X Y 2 ’ ’' Ik( x Y k ) where the q t {x) are distinct irreducible 
polynomials in T[x] and where / l5 1 2 ,..., l k are positive integers. Our 
objective is to decompose V as a direct sum of subspaces invariant under 
T such that on each of these the linear transformation induced by T has, 
as minimal polynomial, a power of an irreducible polynomial. If k = 1, 
V itself already does this for us. So, suppose that k > 1. 

Let V 1 = {v e V\vqfT) 1 ' = 0}, V 2 = {v e V\ vq 2 (T) h = 0},.. ., 
Vk ~ {oe V\ vq k {T) lk = 0}. It is a triviality that each V { is a subspace 
of V. In addition, V t is invariant under T, for if u e V h since T and q^T) 
commute, {uT) qi {T) li = {u qi (T) li )T = 0T = 0. By the definition of F f , 
this places uT'm V]. Let T t be the linear transformation induced by Ton V r 

THEOREM 6.6.1 For each i = 1, 2,..., k, ^ (0) and F=F,®F 2 ® 

* * • © V k . The minimal polynomial of T i is q^x) 1 *. 

Proof. If k = 1 then V = V x and there is nothing that needs proving. 
Suppose then that k > 1. 

We first want to prove that each V { ^ (0). Towards this end, we intro¬ 
duce the k polynomials: 

hfx) = q 2 (x) h q 3 (x) 13 •• • q k {x) lk , 

h 2 (x) = qfxY'qfx) 13 • • • q k (x) l \ ..., 

h i( x ) = II • • • > 

J*t 

h k {x) = qfx) l 'q 2 {x) 13 ■ ■ • q^fx) 1 *-'. 

Since k > 1, h t {x) ^ p{x), whence h t (T) # 0. Thus, given i, there is a 
veV such that^zf = vh t ( T) ^ 0. But wq,(T) lt = v(h,(T)q t (T) 11 ) = vp(T) 
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= 0. In consequence, w * 0 is in T, and so V { * (0). In fact, we have 
shown a little more, namely, that VhfT) * (0) is in T,. Another remark 
about the h t (x) is in order now: if Vj e Vj for j ^ i, since qj(x) 1 | hfx), 
v-h-(T) = 0. 

J The polynomials hfx), h 2 (x), . . . , h k (x) are relatively prime. (Prove!) 
Hence by Lemma 3.9.4 we can find polynomials (*),-••, «k(*) in 

F[*] such that afx^fx) + •■■ + a k (x)h k (x) = 1. From this we get 

ai(T")h l (T) + • • • + a k (T)h k (T) = 1, whence, given v e V, * = *1 = 
v(a l (T)h l (T) + • • ■ + a k (T)h k {T)) = vafT)hfT) + * *' + va k (T)h k (T). 
Now, each va^h^T) is in Vh t {T), and since we have shown above that 
Vh (T) c: V t , we have now exhibited v as v = v l + • * • + v k , where each 
o. = ya i (T)A i (T) is in V v Thus V = V x + V 2 + ■ • • + V k . 

1 We must now verify that this sum is a direct sum. To show this, it is 

enough to prove that if u x + u 2 + ' ' ’ + = 0 eac ^ u i e ^i’ t ^ ien 

each u t = 0. So, suppose that u l + u 2 + ’' ‘ + u k = 0 an< ^ l ^ at some u n 

say u u is not 0. Multiply this relation by hfT ); we obtain uffT) + • • • + 
u k hfT) = 0 hfT) = 0. However, UjhfT) =0 for j ^ 1 since Uje Vy, 
the equation thus reduces to u x h x {T) = 0. But u x qfT) 11 = 0 and since 
hfx) and qfx) are relatively prime, we are led to u x = 0 (Prove!) which 
is, of course, inconsistent with the assumption that u x # 0. So far we 
have succeeded in proving that V — V x © V 2 ® • • • © V k . 

To complete the proof of the theorem, we must still prove that the 
minimal polynomial of T { on V { is q(x) l ‘. By the definition of V„ since 
V-q-(T) li == 0, q^Tf 1 = 0, whence the minimal equation of T ( must be a 
divisor of qi {x)\ thus of the form ? ,(*) /( with/, < By the corollary to 
Lemma 6.6.1 the minimal polynomial of T over F is the least common 
multiple of qfx) fl , . . ., q k (x) fk and so must be qfx) fi • • • qfx) fk . Since 
this minimal polynomial is in fact qfx) h ■ • ■ q k {x) lk we must have that 
f > / i5 f 2 ^ l 2 , . . . ,/ > l k . Combined with the opposite inequality 
above, this yields the desired result /, — / for i = 1 , 2, . . ., k and so com¬ 
pletes the proof of the theorem. 


If all the characteristic roots of T should happen to lie in F, then 
the minimal polynomial of T takes on the especially nice form q{x) — 
(* - X x ) h • • • (* - X k ) lk where A l5 . . . , A fc are the distinct characteristic 
roots of T. The irreducible factors q t (x) above are merely q-£x) = x - Xi- 
Note that on V x , T i only has A, as a characteristic root. 


COROLLARY If all the distinct characteristic roots A,,.. ., A fc of T lie in F, then 
Vcan be written as V = V x © • • ■ © V k where V { — {v eV \ v(T — A,)' 1 = °) 
and where T ; has only one characteristic root, A„ on V 

Let us go back to the theorem for a moment; we use the same notation 
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Vi as the theorem. Since V — V 1 © • • • © V k , if dim = n-, by 
Lemma 6.5.1 we can find a basis of V such that in this basis the matrix of 
T is of the form 



where each A- is an n • x n- matrix and is in fact the matrix of T-. 

What exactly are we looking for? We want an element in the similarity 
class of T which we can distinguish in some way. In light of Theorem 6.3.2 
this can be rephrased as follows: We seek a basis of V in which the matrix 
of T has an especially simple (and recognizable) form. 

By the discussion above, this search can be limited to the linear trans¬ 
formations T- ; thus the general problem can be reduced from the discussion 
of general linear transformations to that of the special linear transformations 
whose minimal polynomials are powers of irreducible polynomials. For 
the special situation in which all the characteristic roots of T lie in F we do 
it below. The general case in which we put no restrictions on the charac¬ 
teristic roots of T will be done in the next section. 

We are now in the happy position where all the pieces have been con¬ 
structed and all we have to do is to put them together. This results in the 
highly important and useful theorem in which is exhibited what is usually 
called the Jordan canonical form. But first a definition. 


DEFINITION The matrix 



with A’s on the diagonal, l’s on the superdiagonal, and 0’s elsewhere, is a 
basic Jordan block belonging to A. 


"^^OREM 6.6.2 Let T e A F (V) have all its distinct characteristic roots, 

^i’ • • •, A ft , in F. Then a basis of V can be found in which the matrix T is of the 
form 
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where each 


<B iX 


Ji = 


B ; 


i 2 


B lr 


and where B a ,.... B ir , are basic Jordan blocks belonging to X,. 

Proof. Before starting, note that an m y. m basic Jordan block belonging 
to X is merely X + M m , where M m is as defined at the end of Lemma 6.5.2. 

By the combinations of Lemma 6.5.1 and the corollary to Theorem 6.6.1, 
we can reduce to the ease when T has only one characteristic root X, that , 
T-X is nilpotent. Thus T = X + (T — X), and since T - X is tub 
potent, by Theorem 6.5.1 there is a basis in which its matrix is of the 


!M 


m 


M„ 


But then the matrix of T is of the form 

' X ' /M, 




(B r 


B« 


using the first remark made in this proof about the relation of a basic Jordan 
block and the M m ’s. This completes the theorem. 

Using Theorem 6.5.1 we could arrange things so that in each J t the size 
of B > s i Z e of B 2 > • • • • When this has been done, then the matrix 

f j 

is called the Jordan form of T. Note that Theorem 6.6.2, for nilpotent 

matrices, reduces to Theorem 6.5.1. 

We leave as an exercise the following: Two linear transformation m 
A r ( V) which have all their characteristic roots in F are similar if and ony tj 

can be brought to the same Jordan form. ... , 

Thus the Jordan form'acts as a “determiner” for similarity classes o 

type of linear transformation. . p 

In matrix terms Theorem 6.6.2 can be stated as follows: A n 

and suppose that K is the splitting field of the minimal polynomial °f Aooer ’ 
then an invertible matrix C e K n can be found so that CAC is in Jor an 
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We leave the few small points needed to make the transition from Theorem 
6.6.2 to its matrix form, just given, to the reader. 

One final remark: If A e F n and if in K n , where K is the splitting field 
of the minimal polynomial of A over F, 

l Jl \ 

CAC ~ 1 = I J* I 


where each corresponds to a different characteristic root, X b of A, then 
the multiplicity of as a characteristic root of A is defined to be n b where J ( 
is an n i x n i matrix. Note that the sum of the multiplicities is exactly n. 

Clearly we can similarly define the multiplicity of a characteristic root 
of a linear transformation. 


Problems 

1. If S and T are nilpotent linear transformations which commute, 
prove that ST and S + T are nilpotent linear transformations. 

2. By a direct matrix computation, show that 


6 10 0 
0 0 10 
0 0 0 0 
0 0 0 0 


0 10 0 
0 0 10 
0 0 0 1 
0 0 0 0 


are not similar. 


3. If n t > n 2 and m± > m 2 , by a direct matrix computation prove that 


and ( Mmi 


are similar if and only if n t = m l3 n 2 = m 2 . 

*4. If Hi > n 2 > n 3 and m l > m 2 > m 2 , by a direct matrix computation 
prove that 


are similar if and only if n t = m u n 2 — m 2 , n 3 = m 3 . 

5. (a) Prove that the matrix 

{-{~\ -:) 

is nilpotent, and find its invariants and Jordan form. 
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(b) Prove that the matrix in part (a) is not similar to 

(J -! -!). 

\ 1 0 0 / 

6. Prove Lemma 6.6.1 and its corollary even if the sums involved are not 
direct sums. 

7. Prove the statement made to the effect that two linear transformations 
in A F (V) all of whose characteristic roots lie in F are similar if and 
only if their Jordan forms are the same (except for a permutation in 
the ordering of the characteristic roots). 

8. Complete the proof of the matrix version of Theorem 6.6.2, given in 
the text. 

9. Prove that the n x n matrix 


0 

0 

0 

.. . 0 

°\ 

1 

0 

0 

. .. 0 

0 

0 

1 

0 

... 0 

0 

? 

0 

1 

... 0 

0 

: i 


i0 0 0 


having entries 1 ’s on the subdiagonal and 0’s elsewhere, is similar to M n . 


= (o l) sati 


satisfies A p = 1. 


10. If F has characteristic p > 0 prove that A 


11. If F has characteristic 0 prove that A = ( j I satisfies A m = 1, 

for m > 0, only if a = 0. ' 

12. Find all possible Jordan forms for 

(a) All 8x8 matrices having x 2 (x — l) 3 as minimal polynomial. 

(b) All 10 x 10 matrices, over a field of characteristic different from 
2, having x 2 (x — l) 2 (x + l) 3 as minimal polynomial. 

13. Prove that the n x n matrix 

/I 1 1 ... 1\ 

. /l 1 1 ... 1 \ 


1 1 1 


is similar to 


( n 0 0 .. . 0\ 

0 0 0 ... 0 | 

0 0 0 ... 6 / 

if the characteristic of F is 0 or if it is p and p )( n. What is the multi¬ 
plicity of 0 as a characteristic root of A? 




Sec. 6.7 Canonical Forms: Rational Canonical Form 


A matrix A = (a tj ) is said to be a diagonal matrix if a ;j = 0 for i ^ j. 

that is, if all the entries off the main diagonal are 0. A matrix (or linear 

transformation) is said to be diagonalizable if it is similar to a diagonal 

matrix (has a basis in which its matrix is diagonal). 

14. If T is m A(V) then T is diagonalizable (if all its characteristic roots 
are in F) if and only if whenever v(T — A) m = 0, for v e V and 
A e F, then v(T — A) = 0. 

15. Using the result of Problem 14, prove that if E 2 = E then E is 
diagonalizable. 

16. If E 2 = E and F 2 = F prove that they are similar if and only if they 
have the same rank. 

17. If the multiplicity of each characteristic root of T is 1, and if all the 

characteristic roots of T are in F, prove that T is diagonalizable 
over F. 

18. If the characteristic of F is 0 and if T e A F (V) satisfies T m = 1, 
prove that if the characteristic roots of T are in F then T is diagonaliz¬ 
able. (Hint: Use the Jordan form of T.) 

19. If A, B e F are diagonalizable and if they commute, prove that 
there is an element CgF„ such that both CAC~ 1 and CBC~ X are 
diagonal. 

20. Prove that the result of Problem 19 is false if A and B do not commute. 


6.7 Canonical Forms: Rational Canonical Form 

The Jordan form is the one most generally used to prove theorems about 
linear transformations and matrices. Unfortunately, it has one distinct, 
serious drawback in that it puts requirements On the location of the charac¬ 
teristic roots. True, if TgA f (V )(or A e F n ) does not have its characteristic 
roots in F we need but go to a finite extension, K, of Fin which all the char¬ 
acteristic roots of T lie and then to bring T to Jordan form over K. In 
act, this is a standard operating procedure; however, it proves the result 
1X1 X" and not in F n . Very often the result in F n can be inferred from that 
f 1 a ^ t ^ lere . are man Y occasions when, after a result has been established 
° r G considered as an element in K n , we cannot go back from K to 
Set the desired information in F n . 

Thus we need some canonical form for elements in A F (V) (or in F ) 
lch P resume s nothing about the location of the characteristic roots of its 
ements, a canonical form and a set of invariants created in A F (V) itself 
*sing only its elements and operations. Such a canonical form is provided 

f S y ra ti° na l canonical form which is described below in Theorem 6.7.1 
a nd its corollary. 
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Let T g A F (V); by means of T we propose to make V into a module over 
F[a:], the ring of polynomials in x over F. We do so by defining, for any 
polynomial / (x) in ^[a:], and any v e V, f {x)v = vf{T). We leave the 
verification to the reader that, under this definition of multiplication of 
elements of V by elements of F[x], V becomes an F [x]-module. 

Since V is finite-dimensional over F, it is finitely generated over F, hence, 
all the more so over F[x:] which contains F. Moreover, -F[x:] is a Euclidean 
ring; thus as a finitely generated module over F[x], by Theorem 4.5.1, V is 
the direct sum of a finite number of cyclic submodules. From the very way 
in which we have introduced the module structure on V, each of these 
cyclic submodules is invariant under T; moreover there is an element m 0 , 
in such a submodule M, such that every element m, in M, is of the form 
m = m of (^) f° r some f (x) g F[a:]. 

To determine the nature of T on V it will be, therefore, enough for us to 
know what T looks like on a cyclic submodule. This is precisely what we 
intend, shortly, to determine. 

But first to carry out a preliminary decomposition of V, as we did in 
Theorem 6.6.1, according to the decomposition of the minimal polynomial 
of T as a product of irreducible polynomials. 

Let the minimal polynomial of T over F be p(x ) = q 1 (x) ei • • • q k (x) Bk , 
where the <jr { (x) are distinct irreducible polynomials in F'fx] and where 
each e v > 0; then, as we saw earlier in Theorem 6.6.1, V = V x © V 2 © • • ■ 
© V k where each V i is invariant under T and where the minimal polynomial 
of T on V t is #;(x) ei . To solve the nature of a cyclic submodule for an 
arbitrary T we see, from this discussion, that it suffices to settle it for a T 
whose minimal polynomial is a power of an irreducible one. 

We prove the 

LEMMA 6.7.1 Suppose that T, in A F (V), has as minimal polynomial over F the 
polynomial p(x) = y 0 + y x x + • • • + y r _ 1 x r_1 + x r . Suppose , further , that 
V, as a module (as described above ), is a cyclic module (that is, is cyclic relative to T .) 
Then there is basis of V over F such that , in this basis , the matrix of T is 

I 0 1 0 ... 0 

0 0 1 ... 0 

o o o ... i 

“7o -7i • ••• -7r-i 

Proof. Since V is cyclic relative to T, there exists a vector v in V such 
that every element w, in V, is of the form w = vf (T) for some/ (x) in F[x]. 

Now if for some polynomial /x:) in .F[.x], vs(T) = 0, then for any u) 
in V, ws(T) = (vf (T))s(T) = vs(T) f (T) = 0; thus s(T) annihilates all 
of V and so s(T) =0. But then p(x) | j'(^) since p(x) is the minimal poly- 
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nomial of T. This remark implies that v, vT, vT> . V T-' are linearly 

independent over F for if not, then V + o h°T+■■■ + *_ vT '- i L £ 
with «o> in F. But then ti(a 0 + or, 7' + = Q 


hence by the above discussion p{x) | (a 0 + oc t x + 
is impossible since p(x) is of degree r unless 


+ a r _ 1 * r *), which 


a o = ai = • • • = a r _ 1 = 0. 


r r+T C r \J k . n y .° J xT ' ■ yr-iT r X , we immediately have that 
1 , tor tc > U, is a linear combination of 1, T ,. . . T r ~ 1 and sn f(T\ 

for any / (*) e F[x], is a linear combination of 1 , T T r ~ x over F 

Since any w in V is of the form a, = vf (T) we get that a,'is a linear com- 
bination of v, vT ,. . ., vT r ~ 

We i!, a ™ P roved * in the above two paragraphs, that the elements v, vT, 

’ ’ j ’ ”T f ° rm a £ asls of V over F - In this basis, as is immediately veri- 

tied, the matrix of T is exactly as claimed 


DEFINITION If /(#) = y 0 + y t x + 
then the r x r matrix 

‘ 0 10 

0 0 1 


+ y r - x x r 1 + x r 


is in 


/ 


**[*], 


o 

■Vo 


0 

-7i 


0 


1 

-y r 


is called the companion matrix of /(*). We write it as C(/(*)). 

Note that Lemma 6.7.1 says that if V is cyclic relative to T and if the minimal 
polynomial of T in F[x] is p{x) then for some basis of V the matrix of T is C(p(x)). 

Note further that tho nr r r .. \\ t... • „ , . ._ '' m 


x T „ r , , , ^ .. J U J 1 is . 

Note further that the matrix C(f(x)),for any monicf(x) in F W , satisfies 


* f (v\ J L rr \ ■ w "t/ inurut j yx) in r [X], satisfies 

J{X) and has f(x) as its minimal polynomial. (See Problem 4 at the end of 
this section; also Problem 29 at the end of Section 6d.) 


We now prove the very important 


THEOREM 6.7.1 If T in A F (V ) has as minimal polynomial p(x) = q(x) e 

' q ) x > ls a mon < Reducible polynomial in F[x], then a basis of V over F can 

j fejound in which the matrix of T is of the form 


f C{q{x)") 


C(q(x) e >) 


m Where e = H > e 2 '>■■■> e r . 



C(q(x) er ) 


rf( O0/ l S ! n F V ' as a m °f Ule OVer is finiteI y generated, and since 
L*J is Euclidean, we can decompose V as V = V t © • • • © V r where the 
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V t are cyclic modules. The V t are thus invariant under T ; if T { is the 
linear transformation induced by T on V 0 its minimal polynomial must be 
a divisor of p{x) = q(x) e so is of the form q(x) ei . We can renumber the 
spaces so that e x > e 2 > • • • > e r . 

Now q(T) ei annihilates each V u hence annihilates V, whence q(T) ei = 
0. Thus e 1 > e; since e l is clearly at most e we get that e t = e. 

By Lemma 6.7.1, since each V v is cyclic relative to T, we can find a basis 
such that the matrix of the linear transformation of T t on V t is C(q(x) ei ). 
Thus by Theorem 6.6.1 a basis of V can be found so that the matrix of T 
in this basis is 

/ C (q(x) e ) 

c (q(x) e2 ) 

\ C(q(x)+) 



COROLLARY If T in A F (V) has minimal polynomial p[x) = qfx) u • • • q k {x) h 
over F, where qfx),, q k (x) are irreducible distinct polynomials in F[x\, then a 
basis of V can be found in which the matrix of T is of the form 



where each 


jCiqW'i) 



where e t = e n > e i2 > • • • > e ir( . 

Proof. By Theorem 6.5.1, V can be decomposed into the direct sum 
V = V 1 © • • • © V k , where each V { is invariant under T and where the 
minimal polynomial of T t , the linear transformation induced by T on V i} 
has as minimal polynomial qi{x) e ‘. Using Lemma 6.5.1 and the theorem 
just proved, we obtain the corollary. If the degree of qi(x) is d v note that 
the sum of all the d^ { j is n, the dimension of V over F. 


DEFINITION The matrix of T in the statement of the above corollary 
is called the rational canonical form of T. 

DEFINITION The polynomials ft(*)*“, ft(*) Cl2 , • • •, ft(*) eir ‘, • ■ • > ?*(*)*”“’ 
. . . , q k (x) ekr * in F[x] are called the elementary divisors of T. 


One more definition! 



Sec. 6.7 Canonical Forms: Rational Canonical Form 309 


DEFINITION If dim f (V) = n, then the characteristic polynomial of T, 
p T ( x )> is the product of its elementary divisors. 

We shall be able to identify the characteristic polynomial just defined 
with another polynomial which we shall explicitly construct in Section 6.9. 
The characteristic polynomial of T is a polynomial of degree n lying in 
F[x]. It has many important properties, one of which is contained in the 

REMARK Every linear transformation T e A F (V) satisfies its characteristic 
polynomial. Every characteristic root of T is a root of p T (x). 

Note 1. The first sentence of this remark is the statement of a very famous 
theorem, the Cayley-Hamilton theorem. However, to call it that in the form 
we have given is a little unfair. The meat of the Cayley-Hamilton theorem 
is the fact that T satisfies p F (x) when p T {x ) is given in a very specific, con¬ 
crete form, easily constructible from T. However, even as it stands the 
remark does have some meat in it, for since the characteristic polynomial is 
a polynomial of degree n, we have shown that every element in A P {V) does 
satisfy a polynomial of degree n lying in F[x\. Until now, we had only 
proved this (in Theorem 6.4.2) for linear transformations having all their 
characteristic roots in F. 

Note 2. As stated the second sentence really says nothing, for whenever T 
satisfies a polynomial then every characteristic root of T satisfies this same 
polynomial; thus p T (x ) would be nothing special if what were stated in the 
theorem were all that held true for it. However, the actual story is the 
following: Every characteristic root of T is a root of p T {x), and conversely, 
every root of p T (x) is a characteristic root of T; moreover, the multiplicity of any 
Toot of p T (x ), as a root of the polynomial, equals its multiplicity as a characteristic 
root of T. We could prove this now, but defer the proof until later when we 
shall be able to do it in a more natural fashion. 

Proof of the Remerk. We only have to show that T satisfies p T (#), but 
this beomes almost trivial. Since p T [x ) is the product of qfx) en , qfxf 12 , 

;;, <2k( x ) ek \ • • •, and since e n = e x , e 21 = e 2i ..., e kl = e k , p T (x) is di¬ 
visible by p(x) = qfx) ei • • • q k {x) ek , the minimal polynomial of T. Since 
p{T) = 0 it follows that p T {T) = 0. 

We have called the set of polynomials arising in the rational canonical 
form of T the elementary divisors of T. It would be highly desirable if these 
determined similarity in A F (V), for then the similarity classes in A F (V) 
Would be in one-to-one correspondence with sets of polynomials in F[x]. 
We propose to do this, but first we establish a result which implies that two 
linear transformations have the same elementary divisors. 

THEOREM 6.7.2 Let V and W he two vector spaces over F and suppose that p 
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is a vector space isomorphism of V onto W. Suppose that S e A F ( V) and Te 
A F (W) are such that for any v e V, (vS) p = (»i J/)T. Then S and T have the 
same elementary divisors. 

Proof. We begin with a simple computation. If v e V, then (vS 2 3 )p = 
{{vS)S)p = ({vS)p)T = {{vp)T)T = {vp)T 2 . Clearly, if we continue in 
this pattern we get ( vS m )p = ( vP)T m for any integer m > 0 whence for 
any polynomial /(*) e F[x] and for any v e V, {vf {S))p = {vp)f(T). 

If = o t hen ( vp)f(T) = 0 for any v e V, and since p maps V 

onto W, we would have that Wf (T) = (0), in consequence of which 
f(T) = 0. Conversely, if g(x) e F[x] is such that g{T) = 0, then for any 
v e V, ( vg{S))p = 0, and since p is an isomorphism, this results in 
vg(S) = 0. This, of course, implies that g(S) = 0. Thus S and T satisfy 
the same set of polynomials in T[x], hence must have the same minimal polynomial. 

p{x) = q l {x) ei q 2 {x) e2 ''' q k ( x Y k 

where qfx), ... , q k {x) are distinct irreducible polynomials in F[x] 

If U is a subspace of V invariant under S, then Up is a subspace of W 
invariant under T, for (Up)T = ( US)p e Up. Since U and Up are 
isomorphic, the minimal polynomial of 5 l5 the linear transformation induced 
by S on U is the same, by the remarks above, as the minimal polynomial of 
7\, the linear transformation induced on Up by T. 

Now, since the minimal polynomial for S on V is p(x) = qi{x) ei ‘ ' ' qf x ) k > 
as we have seen in Theorem 6.7.1 and its corollary, we can take as the 
first elementary divisor of S the polynomial qfx) ei and we can find a sub¬ 
space of V j of V which is invariant under S such that 

1. V = V l © M where M is invariant under S. 

2. The only elementary divisor of S 1} the linear transformation induced 
on V x by S, is qfx) ei . 

3. The other elementary divisors of S are those of the linear transformation 
S 2 induced by S on M. 

We now combine the remarks made above and assert 

1. W = W x © N where W l = V t p and N = Mp are invariant under T. 

2. The only elementary divisor of T v , the linear transformation induced 
by T on W x , is qfxf 1 (which is an elementary divisor of T since the minimal 

polynomial of T is p{x) = qfxf 1 • * * q k { x Y k )- 

3. The other elementary divisors of T are those of the linear transformation 

T 2 induced by T on N. 

\ 

Since N = Mp, M and N are isomorphic vector spaces over F under the 
isomorphism p 2 induced by p. Moreover, if u £ M then ( uS 2 )p 2 
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(uS)x]/ = (mJ/)T = (u\J/ 2 ) T 2 , hence S 2 and T 2 are in the same relation 
vis-a-vis \J/ 2 as S and T were vis-a-vis if/. By induction on dimension (or 
repeating the argument) S 2 and T 2 have the same elementary divisors. 
But since the elementary divisors of S are merely qfxf 1 and those of ^ 
while those of T are merely qfx) e 1 and those of T 2 , S, and T must have 
the same elementary divisors, thereby proving the theorem. 

Theorem 6.7.1 and its corollary gave us the rational canonical form and 
gave rise to the elementary divisors. We should like to push this further 
and to be able to assert some uniqueness property. This we do in 

THEOREM 6.7.3 The elements S and T in A f (V) are similar in A F (V) if 
and only if they have the same elementary divisors. 

Proof. In one direction this is easy, for suppose that S and T have the 
same elementary divisors. Then there are two bases of V over F such that 
the matrix of S in the first basis equals the matrix of T in the second (and 
each equals the matrix of the rational canonical form). But as we have 
seen several times earlier, this implies that S and T are similar. 

We now wish to go in the other direction. Here, too, the argument 
resembles closely that used in Section 6.5 in the proof of Theorem 6.5.2. 
Having been careful with details there, we can afford to be a little sketchier 
here. 

We first remark that in view of Theorem 6.6.1 we may reduce from the 
general case to that of a linear transformation whose minimal polynomial 
is a power of an irreducible one. Thus without loss of generality we may 
suppose that the minimal polynomial of T is q(x) e where q{x) is irreducible 
in F[x] of degree d. T " 

The rational canonical form tells us that we can decompose V as V = 
© • • • © V r , where the subspaces V { are invariant under T and where 
the linear transformation induced by T on V t has as matrix C(g(x) et ), the 
companion matrix of q(x) ei . We assume that what we are really trying to 
prove is the following: If V = U x 0 U 2 0 • • • © U s where the JJ } are 
invariant under T and where the linear transformation induced by Ton U• 
has as matrix C{q{x) f J), f x > f 2 > • • • > f s , then r = s and e 1 = f, 

e 2 = / 2 > • • •, e r = f r . (Prove that the proof of this is equivalent to proving 
the theorem!) 

Suppose then that we do have the two decompositions described above, 
V — V t © • • • © V r and V = U l © • • • © U s , and that some e t ^ f-. 
Then there is a first integer m such that e m ^ f m , while e t = / l5 . .., e _ t = 
/«-!• We may suppose that e m > f m . 

Now g(T) fm annihilates U m , U m + l ,..., U s , whence 

VqiTY- = U iq (T)f m © ••• © U m _ x q{TY m . 
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However, it can be shown that the dimension of U iq (T) fm for i <m is 
d(f t - fj (Prove!) whence 

dim ( Vq{T) fm ) = </(/, - fm) + • • • + d (fm- 1 - fm)- 

On the other hand, Vq{T)^ = V t f (T)'" 0 • *' © ' ’ ’ © V^T)'” and 
since V t q(T) fm has dimension d(e t - f m ), for i < m, we obtain t at 

dim ( Vq(T) fm ) > d{e t - f m ) + • ■ ■ + d(e m - /J- 

c- f p - f t and e > f m , this contradicts the equality 

Since e x = J x , ■ • • 5 e m -i — Jm-i ^ Jm> 

proved above. We have thus proved the theorem. 

COROLLARY 1 Suppose the two matrices A, B in F n are similar in K„ where 
K is an extension of F. Then A and B are already similar in F„. 


Proof. Suppose that A, B e F n are such that B - C X AC with C £ 

We consider K n as acting on X<“>, the vector space of a-tuples over K. 
Thus F (n) is contained in K {n) and although it is a vector space over F it is 
no, a vector space over K. The image of F#>, in *<">, under C need no, fall 
back in F ( "> but at any rate F<">C is a subset of which a vector space 
over F. (Prove!) Let Fbe the vector space over F, W the vector space 
F {n) C over F, and for v e V let vp = vC. Now A £ A F (V) and B e A F (W) 
and for any v e V, (vA)iJ, = vAC = vCB = W)B whence the conditions 
of Theorem 6.7.2 are satisfied. Thus A and B have the same elementary 
divisors; by Theorem 6.7.3, A and B must be similar in F n . 

A word of caution: The corollary does not state that if A, B £ F n are such 
that B = C' 1 AC with CeK n then C must of necessity be in F ; this is 
false. It merely states that if A, B £ F n are such that B = C ^4Cwit 
C £ K n then there exists a (possibly different) D e F n such that B - 

D~ l AD. 


Problems 

1. Verify that V becomes an F|>]-module under the definition given. 

2. In the proof of Theorem 6.7.3 provide complete proof at all points 
marked “(Prove).” 

*3. (a) Prove that every root of the characteristic polynomial of T is a 
characteristic root of T. 

(b) Prove that the multiplicity of any root of p T {x) is equal to is 
multiplicity as a characteristic root of T. 

4. Prove that for / (*) £ F[*], £(/(*)) satisfies f {x) and has / (x) as its 
minimal polynomial. What is its characteristic polynomial. 

5. If F is the field of rational numbers, find all possible rational canonical 
forms and elementary divisors for 
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(a) The 6 x 6 matrices in F 6 having (x - l)(x 2 + l) 2 as minimal 
polynomial. 

(b) The 15 x 15 matrices in F 1S having (x 2 + x + \) 2 {x 3 + 2) 2 
as minimal polynomial. 

(c) The 10 x 10 matrices in F 10 having (.* 2 + l) 2 (x 3 + 1) as mini¬ 
mal polynomial. 

6. (a) If K is an extension of F and if A is in K n , prove that A can be 
written as A = A 1 A 1 + • • • + A k A k where A u ..., A k are in F n 
and where X u ... y X k are in K and are linearly independent over 
F. 

(b) With the notation as in part (a), prove that if B e F n is such that 
AB = 0 then A^B = A 2 B = • • • = A k B = 0. 

(c) If C in F n commutes with A prove that C commutes with each 
of A 1} A 2 ,.. ., A k . 

*7. If A lt .. ., A k are in F n and are such that for some A l5 ..., X k in K, 
an extension of F, X l A 1 + * * • + X k A k is invertible in K n , prove that 
if F has an infinite number of elements we can find oc 1} ..., oc k in F such 
that a 1 A 1 + • • • + a k A k is invertible in F n . 

*8. If F is afinite field prove the result of Problem 7 is false. 

*9. Using the results of Problems 6(a) and 7 prove that if F has an infinite 
number of elements then whenever A, B e F n are similar in K n , where 
K is an extension of F y then they are familiar in F n . (This provides us 
with a proof, independent of canonical forms of Corollary 1 to Theorem 
6.7.3 in the special case when F is an infinite field.) 

10. Using matrix computations (but following the lines laid out in Problem 
9), prove that if F is the field of real numbers and K that of complex 
numbers, then two elements in F 2 which are similar with K 2 are already 
similar in F 2 . 

6.8 Trace and Transpose 

After the rather heavy going of the previous few sections, the uncomplicated 
nature of the material to be treated now should come as a welcome respite. 
Let F be a field and let A be a matrix in F . 

n 

DEFINITION The trace of A is the sum of the elements on the main 
diagonal of A. 

We shall write the trace of A as tr A; if A = (af, then 

tr A = «»• 

i= 1 

The fundamental formal properties of the trace function are contained in 
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LEMMA 6.8.1 For A, B e F n and X e F, 

1. tr (XA) = X tr A. 

2. tr (A + B) = tr A + tr B. 

3. tr (AB) = tr (BA). 

Proof. To establish parts 1 and 2 (which assert that the trace is a linear 
functional on F n ) is straightforward and is left to the reader. We only 
present the proof of part 3 of the lemma. 

If A = (otij) and B = ([i ij ) then AB = (y ij ) where 

rt 

yij ^ v ^ikPkj 
k= 1 

and BA = (m.) where 

n 

t^i j ^ ' P ik^'k j' 

k= 1 

Thus 

tr (AB) = 2 y H = 2 ^ ^ikPk^j i 

if we interchange the order of summation in this last sum, we get 

tr (AB) = oc ik P kl = 2(2 PkiXik) = 2 Vkk = tr (BA). 

k =1 i=l k = 1 \ «— 1 / k=l 

COROLLARY If A is invertible then tr ( ACA~ 1 ) = tr C. 

Proof. Let B — CA~ 1 ; then tr ( ACA~ 1 ) = tr (AB) = tr (BA) = 
tr (CA~ X A) = tr C. 

This corollary has a twofold importance; first, it will allow us to define 
the trace of an arbitrary linear transformation; secondly, it will enable us 
to find an alternative expression for the trace of A. 

DEFINITION If T e A(V) then tr T, the trace of T, is the trace of m 1 (T) 
where m 1 (T) is the matrix of T in some basis of V. 

We claim that the definition is meaningful and depends only on T and 
not on any particular basis of V. For if m 1 (T) and m 2 (T) are the matrices 
of T in two different bases of V, by Theorem 6.3.2, m 1 (T) and m 2 (T) are 
similar matrices, so by the corollary to Lemma 6.8.1 they have the same 
trace. 

LEMMA 6.8.2 If T e A(V) then tr T is the sum of the characteristic roots of 
T (using each characteristic root as often as its multiplicity). 
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,ST ■ T f SSUmC at Ti$ a matrix in F '< ifK k the fitting field 
for the minimal polynomial of T over F, then in K n , by Theorem 6.6 2 T 

can be brought to its Jordan form, J. J is a matrix on whose diagonal 

", h f e Ch “' iC r ° 0tS ° f T • each root appearing as often as its 
p 1 y. hus try _ sum of the characteristic roots of T; however 
since J is of the form ATA~\ tv J = tr T, and this proves the lemma’ 

fi aVf "I'P 0 '™' then aU its characteristic roots are 0, whence by Lemma 

t ?T< ^OfoTaUi >7 15 niip °' ent - then 80 are T\T 3 ,...; thus 

What about other directions, namely, if tr T l = 0 for i = 1 9 
does “ follow that T is nilpotent? In this generality the answer is for 
li is a field of characteristic 2 then the unit matrix 


in F 2 has trace 0 (for 1 + 1 = 0) as do all its powers, yet clearly the unit 
matnx !S not nilpotent. However, if we restrict the characteristic of F to 
be 0, the result is indeed true. 

- 8 n 3 f If n- is \ fuU ° f cha,aeteristic °. if TeA F (V) is such 
that tr T — 0 for all i > 1 then T is nilpotent. 

m Pr °° f ' m S , mCe TeA r( V )’ T satisfies some minimal polynomial fi(x) = 
x + ot lX + ■ ■ ■ + a m ; from T m + ai r m_ 1 + -- - + a_ 1 7 , + a =0 
taking traces of both sides yields " m 

tr T m + tr T m 1 + ■ • • + a m _ l tr T + tr oc m = 0. 

However, by assumption, tr T> = 0 for t > 1, thus we get trot. = 0; if 
nun F _ tr ot„ = nct„ whence na m = 0. But the characteristic of F is’o- 
Aerefore, n # 0, hence it follows that a. = 0. Since the constant tem 
ot the minimal polynomial of T is 0, by Theorem 6.1.2 f is singular and 
SO U IS a characteristic root of T. 

We can consider T as a matrix in F, and therefore also as a matrix in K , 

'' C r Jf “ of F which in turn contains all the characteristic 

and f T n ^ K c by Theorem 6 ' 4 -'> we can bring T to triangular form, 
and since 0 is a characteristic root of T, we can actually bring it to the form 


0 

o ... 

o 1 


b 

a 2 0 . 

0 

= (—- 

Pn 

* 

«» I 

v* 
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where 



is an (n — 1) x (n — 1) matrix (the *’s indicate parts in which we are 
not interested in the explicit entries). Now 



hence 0 = tr T k = tr T k . Thus T 2 is an (n - 1) x (» - 1) matrix with 
the property that tr T 2 k = 0 for all k > 1. Either using induction on n, 
or repeating the argument on T 2 used for T, we get, since a 2 , . . ., ct„ are 
the characteristic roots of T 2 , that a 2 = ’ ’' = oc n = 0. Thus when T is 
brought to triangular form, all its entries on the main diagonal are 0, 
forcing T to be nilpotent. (Prove!) 


This lemma, though it might seem to be special, will serve us in good 
stead often. We make immediate use of it to prove a result usually known 
as the Jacobson lemma. 


LEMMA 6.8.4 If F is of characteristic 0 and if S and T, in A F (V), are such 
that ST — TS commutes with S, then ST — TS is nilpotent. 

Proof. For any k > 1 we compute (ST — TS) k . Now (ST — TS) k = 
(ST - TSf-fST - TS) = (ST - TS) k ~ 'ST - (ST - TSf^TS. 
Since ST — TS commutes with S, the term (ST — TS) k l ST can be 
written in the form S((ST — TS) k 1 T). If we let B = (ST — TS) T, 
we see that (ST - TS) k = SB ~ BS; hence tr ((ST - TS ) k ) = 
tr ( SB - BS) = tr (SB) - tr (BS) = 0 by Lemma 6.8.1. The previous 
lemma now tells us that ST — TS must be nilpotent. 

The trace provides us with an extremely useful linear functional on F„ 
(and so, on A F (V)) into F. We now introduce an important mapping of 
F n into itself. 

DEFINITION If A = (a tj ) e F n then the transpose of A, written as A', 
is the matrix A' = (y t j) where = a j t for each i and j. 

The transpose of A is the matrix obtained by interchanging the rows and 
columns of A. The basic formal properties of the transpose are contained m 
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LEMMA 6.8.5 For all A, B e F , 

1. (A')' = A. 

2. (A + B)' = A' + B'. 

3. (AB)' = B'A'. 

Proof. The proofs of parts 1 and 2 are straightforward and are left to 
the reader; we content ourselves with proving part 3. 

Suppose that A = (a,..) and B = (jBy); then Tfl = (Ay) where 

n 

^ij = a ikPkj- 
k= 1 

Therefore, by definition, (/45)' = (^y), where 

n 

f 1 U ~ ^ji = a jkPki- 

k= 1 

On the other hand, = (y v ) where Vj . = a,, and 5' = (£. y ) where 
€ij = Pji’ w hence the (z, j) element of B'A' is 

" n n 

^ Zikllkj = X/ Pki a jk = X/ a jkPki = Hi]- 

k= 1 fc=l fc= 1 

That is, (/45)' = ITT' and we have verified part 3 of the lemma. 

In part 3, if we specialize A = B we obtain (T 2 )' = (T') 2 . Continuing, 
we obtain (T*)' = (A') k for all positive integers k. When A is invertible 
then ( A _1 )' = (T') _1 . 

There is a further property enjoyed by the transpose, namely, if A e F 
then (A/4) = XA' for all A e F n . Now, if A e F n satisfies a polynomial 
<x 0 A m + oc 1 A m ~ 1 + • • • + a m = 0, we obtain ( oc 0 A m + • • • + a J' = O' *= 0. 
Computing out (a 0 /4 m + • • • + a m )' using the properties of the transpose, 
we obtain oc 0 (A') m + a,{A') m ~ 1 + • • • + oc m = 0, that is to say, A' satisfies 
any polynomial over F which is satisfied by A. Since A = (A')', by the 
same token, A satisfies any polynomial over F which is satisfied by A'. 
In particular, A and A' have the same minimal polynomial over F and so 
they have the same characteristic roots. One can show each root occurs with 
the same multiplicity in A and A'. This is evident once it is established that 
A and A' are actually similar (see Problem 14). 

DEFINITION The matrix A is said to be a symmetric matrix if A' = A. 

DEFINITION The matrix A is said to be a skew-symmetric matrix if 
A' = —A. 

When the characteristic of F is 2, since 1 = ~1, we would not be able 
to distinguish between symmetric and skew-symmetric matrices. We make 
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the flat assumption for the remainder of this section that the characteristic of F is 
different from 2. 

Ready ways for producing symmetric and skew-symmetric matrices are 
available to us. For instance, if A is an arbitrary matrix, then A + A' is 
symmetric and A — A' is skew-symmetric. Noting that A = \{A + A') + 
%(A — A'), every matrix is a sum of a symmetric one and a skew-symmetric 
one. This decomposition is unique (see Problem 19). Another method of 
producing symmetric matrices is as follows: if A is an arbitrary matrix, 
then both AA’ and A'A are symmetric. (Note that these need not be equal.) 

It is in the nature of a mathematician, once given an interesting concept 
arising from a particular situation, to try to strip this concept away from 
the particularity of its origins and to employ the key properties of the con¬ 
cept as a means of abstracting it. We proceed to do this with the transpose. 
We take, as the formal properties of greatest interest, those properties of 
the transpose contained in the statement of Lemma 6.8.5 which asserts that 
on F n the transpose defines an anti-automorphism of period 2. This leads 
us to make the 

DEFINITION A mapping * from F n into F n is called an adjoint on F n if 

1. (A*)* = A; 

2. (A + B )* = A* + £*; 

3. (AB)* = B*A *; 

for all A, B e F n . 

Note that we do not insist that (AT)* = AT* for A e F. In fact, in some 
of the most interesting adjoints used, this is not the case. We discuss one 
such now. Let F be the field of complex numbers; for T = (oq 7 ) e F n , let 
A* = (y ij ) where y l7 = QCji the complex conjugate of a: /7 . In this case * is 
usually called the Hermitian adjoint on F n . A few sections from now, we 
shall make a fairly extensive study of matrices under the Hermitian adjoint. 

Everything we said about transpose, e.g., symmetric, skew-symmetric, 
can be carried over to general adjoints, and we speak about elements sym¬ 
metric under * (i.e., T* = T), skew-symmetric under *, etc. In the exercises 
at the end, there are many examples and problems referring to general 
adjoints. 

However, now as a diversion let us play a little with the Hermitian 
adjoint. We do not call anything we obtain a theorem, not because it is 
not worthy of the title, but rather because we shall redo it later (and properly 
label it) from one central point of view. 

So, let us suppose that F is the field of complex numbers and that the 
adjoint, *, on F n is the Hermitian adjoint. The matrix T is called Hermitian 

if T* = T. 
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First remark: If A # 0 e F„, then tr (AA*) > 0. Second remark: As a 
consequence of the first remark, if A u . . ., A k e F n and if A X A* + A 2 A 2 * + 
= then A l = A 2 = • • ■ = A k = 0. Third remark: If A 
is a scalar matrix then A* = A, the complex conjugate of A. 

Suppose that A e F„ is Hermitian and that the complex number a + Pi, 
where a and /? are real and i 1 2 = -1, is a characteristic root of A. Thus 
A — (a + Pi) is not invertible; but then (A - ( a + pi))(A — (a — pi)) = 
(A — a) 2 + p 2 is not invertible. However, if a matrix is singular, it must 
annihilate a nonzero matrix (Theorem 6.1.2, Corollary 2). There must 
therefore be a matrix C # 0 such that C((A - a) 2 + p 2 ) = 0. We multiply 
this from the right by C * and so obtain 

C(A - oc) 2 C* + p 2 CC* = 0. (1) 

Let D = C(A — a) and E = pC. Since A* = A and a is real, 
C(A — a) C* = DD *; since P is real, p 2 CC* = EE*. Thus equation 
(1) becomes DD * + EE* = 0; by the remarks made above, this forces 
D = 0 and E = 0. We only exploit the relation E — 0. Since 0 = E = 
PC and since C ^ 0 we must have P = 0. What exactly have we proved? 
In fact, we have proved the pretty (and important) result that if a complex 
number A is a characteristic root of a Hermitian matrix, then A must be real. Ex¬ 
ploiting properties of the field of complex numbers, one can actually restate 
this as follows: The characteristic roots of a Hermitian matrix are all real. 

We continue a little farther in this vein. For A e F„, let B = AA*; B 
is a Hermitian matrix. If the real number a is a characteristic root of B, 
can a be an arbitrary real number or must it be restricted in some way? 
Indeed, we claim that oc must be nonnegative. For if a were negative then 
a = ~P 2 , where P is a real number. But then B - a = B + /f® = 
AA* + p 2 is not invertible, and there is a C ^ 0 such that C(AA* + p 2 ) 
= 0. Multiplying by C* from the right and arguing as before, we obtain 
P 0, a contradiction. We have shown that any real characteristic root 
of AA* must be nonnegative. In actuality, the “real” in this statement 
is superfluous and we could state: For any A e F n all the characteristic 
roots of A A* are nonnegative. 

Problems 

Unless otherwise specified, symmetric and skew-symmetric refer to 
transpose. 

1. Prove that tr (A + B) = tr A + tr B and that for A e F, tr (AA) = 

A tr A. 

2. (a) Using a trace argument, prove that if the characteristic of F is 0 

then it is impossible to find A, B e F n such that AB - BA = 1. 
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(b) In part (a), prove, in fact, that 1 - ( AB - BA) cannot be nil- 
potent. 

3. (a) Let/"be a function defined on F n having its values in F such that 

1 .f(A +B) = f(A) +f(B); 

2 .f(XA) = Xf{A)-, 

3- f {AB) = f (BA ); 

for all A, Be F n and all X e F. Prove that there is an element 
ot 0 e F such that f (A) = a 0 tr A for every A in F n . 

(b) If the characteristic of F is 0 and if the / in part (a) satisfies the 
additional property that f{ 1) = n, prove that f (A) = tr A for 
all A e F n . 

Note that Problem 3 characterizes the trace function. 

*4. (a) If the field F has an infinite number of elements, prove that every 
element in F n can be written as the sum of regular matrices. 

(b) If F has an infinite number of elements and if f, defined on F n 
and having its values in F, satisfies 

\.f(A + B) = f(A) + f (B); 

2.f(U) = Xf(A); 

3- f (BAB ~*) =f(A); 

for every A e F n , X e F and invertible element B in F n , prove 
that f (A) = a 0 tr A for a particular a 0 e F and all A e F„. 

5. Prove the Jacobson lemma for elements A, B e F n if n is less than 
the characteristic of F. 

6. (a) If C e F n , define the mapping d c on F n , by d c (X) = XC — CX 

for X e F n . Prove that d c (XY) = ( d c (X))Y + X(d c (Y)). 
(Does this remind you of the derivative?) 

(b) Using (a), prove that if AB — BA commutes with A, then for 
any polynomial q{x) e F[x], q(A)B — Bq(A) = q'(A)(AB — BA), 
where q'(x) is the derivative of q(x). 

*7. Use part (b) of Problem 6 to give a proof of the Jacobson lemma. 
{Hint: Let p{x) be the minimal polynomial for A and consider 0 = 
p{A)B - Bp{A).) 

8. (a) If A is a triangular matrix, prove that the entries on the diagonal 

of A are exactly all the characteristic roots of A. 

(b) If A is triangular and the elements on its main diagonal are 0, 
prove that A is nilpotent. 

9. For any A, B e F n and X e F prove that (A')' — A, {A + B)' = 
A' + B', and (XA)' = XA'. 

10. If A is invertible, prove that {A* 1 )' = ( A ') -1 . 

11. If A is skew-symmetric, prove that the elements on its main diagonal 
are all 0. 
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12. If A and B are symmetric matrices, prove that AB is symmetric if 
and only if AB — BA. 

13. Give an example of an A such that AA' ^ A'A. 

*14. Show that A and A' are similar. 

15. The symmetric elements in F n form a vector space; find its dimension 
and exhibit a basis for it. 

*16. In F n let S denote the set of symmetric elements; prove that the 
subring of F n generated by S is all of F n . 

*17. If the characteristic of F is 0 and A e F n has trace 0 (tr A = 0) prove 
that there is a C e F n such that CAC * has only O’s on its main 
diagonal. 

*18. If F is of characteristic 0 and A e F n has trace 0, prove that there 
exist B, C e F n such that A = BC — CB. ( Hint: First step, assume, by 
result of Problem 17, that all the diagonal elements of A are 0.) 

19. (a) If F is of characteristic not 2 and if * is any adjoint on F n , let 

S = {A e F„ | A* = A} and let K = {A e F n | A* = - A}. Prove 
that S 4- K = F n . 

(b) If A e F n and A = B + C where B eS and C e K, prove that 
B and C are unique and determine them. 

20. (a) If A, B e S prove that AB + BA e S. 

(b) If A, B e K prove that AB — BA e K. 

(c) If A e S and B e K prove that AB — BA e S and that AB + 
BAeK. 

21. If (p is an automorphism of the field F we define the mapping O on 
F„ by: If A = (a {j -) then 0(^4) = (^(a^)). Prove that 0(^4 + B) = 
0(^4) + O (B) and that <$>{AB) = 0(^4)0(5) for all A, B e F n . 

22. If * and <§) define two adjoints on F n , prove that the mapping 
^'•A -> (A*)® for every A e F„ satisfies \j/(A + B) = i]/(A) + \j/(B) 
and i//(AB) = \l/(A)i]/(B) for every A, B e F n . 

23. If * is any adjoint on F n and X is a scalar matrix in F n , prove that X* 
must also be a scalar matrix. 

*24. Suppose we know the following theorem: If \j/ is an automorphism 
of F n (i.e., ij/ maps F n onto itself in such a way that \j/(A + B) = 
il/(A) + i]/(B) and ij/(AB) = (A)\j/(B)) such that (/'(/l) = X for 

every scalar matrix X, then there is an element P e F n such that 
i]/(A) = PAP 1 for every A e F n . On the basis of this theorem, prove: 
If * is an adjoint of F n such that A* = X for every scalar matrix X 
then there exists a matrix P e F n such that A* = PA'P~ X for every 
A e F n . Moreoever, P~ X P' must be a scalar. 

25. If P e F n is such that P~ X P' ^ 0 is a scalar, prove that the mapping 
defined by A* = PA'P~ X is an adjoint on F n . 
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*26. Assuming the theorem about automorphisms stated in Problem 24, 
prove the following: If * is an adjoint on F n there is an automorphism 
(j) of F of period 2 and an element P E F n such that A* = P(0(A))'P~ 1 
for all AeF n (for notation, see Problem 21). Moreover, P must 
satisfy P" 1 0(P)' is a scalar. 

Problems 24 and 26 indicate that a general adjoint on F n is not so far 
removed from the transpose as one would have guessed at first glance. 

*27. If i jj is an automorphism of F n such that t ]/{X) = A for all scalars, 
prove that there is a P e F n such that i J/(A) = PAP~ 1 for every A e F n . 

In the remainder of the problems , F will be the field of complex numbers and * the 
Hermitian adjoint on F n . 

28. If A e F n prove that there are unique Hermitian matrices B and C 
such that A — B + iC (i 2 = —\). 

29. Prove that tr AA* > 0 if A ^ 0. 

30. By directly computing the matrix entries, prove that if A X A X * + 

+ A k A k * = 0, then A x = A 2 = • ■ • = A k — 0. 

31. If A is in F n and if BAA* = 0, prove that BA = 0. 

32. If A in F n is Hermitian and BA k = 0, prove that BA = 0. 

33. If A E F n is Hermitian and if A, j u are two distinct (real) characteristic 
roots of A and if C(A — A) = 0 and D{A - f) =0, prove that 
CD* = DC* = 0. 

*34. (a) Assuming that all the characteristic roots of the Hermitian matrix 
A are in the field of complex numbers, combining the results of 
Problems 32, 33, and the fact that the roots, then, must all be 
real and the result of the corollary to Theorem 6.6.1, prove that 
A can be brought to diagonal form; that is, there is a matrix P 
such that PAP~ 1 is diagonal. 

(b) In part (a) prove that P could be chosen so that PP* = 1. 

35. Let V n = {AeF„\ AA* = 1}. Prove that V n is a group under 
matrix multiplication. 

36. If A commutes with AA* — A* A prove that AA* = A*A. 

6.9 Determinants 

The trace defines an important and useful function from the matrix ring 

F n (and from A F {V)) into F; its properties concern themselves, for the most 

part, with additive properties of matrices. We now shall introduce the even 

more important function, known as the determinant, which maps F n into F. 
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Its properties are closely tied to the multiplicative properties of matrices. 

Aside from its effectiveness as a tool in proving theorems, the determinant 
is valuable in “practical” ways. Given a matrix T, in terms of explicit 
determinants we can construct a concrete polynomial whose roots are the 
characteristic roots of T ; even more, the multiplicity of a root of this poly¬ 
nomial corresponds to its multiplicity as a characteristic root of T. In fact, 
the characteristic polynomial of T, defined earlier, can be exhibited as this 
explicit, determinantal polynomial. 

Determinants also play a key role in the solution of systems of linear 
equations. It is from this direction that we shall motivate their definition. 

There are many ways to develop the theory of determinants, some very 
elegant and some deadly and ugly. We have chosen a way that is at neither 
of these extremes, but which for us has the advantage that we can reach the 
results needed for our discussion of linear transformations as quickly as 
possible. 

In what follows F will be an arbitrary field, F n the ring ofn x n matrices 
over F , and F^ the vector space of n-tuples over F. By a matrix we shall 
tacitly understand an element in F n . As usual, Greek letters will indicate 
elements of F (unless otherwise defined). 

Consider the system of equations 

a ll*l T a 12^2 = P u 
<x 21 x l T &22 X 2 = p2' 

We ask: Under what conditions on the a tj can we solve for x u x 2 given 
arbitrary fj u f$ 2 ? Equivalently, given the matrix 



when does this map F (2) onto itself? 

Proceeding as in high school, we eliminate x 1 between the two equations; 
the criterion for solvability then turns out to be a n a 22 — a 12 a 21 # 0. 

We now try the system of three linear equations 

^11^1 T &i2 x 2 T 0Ci3#3 = Pi, 
a 2l x l T <x 22 x 2 T a 23" V 3 = /^2> 

T & 32 X 2 T ^ 33^3 “ ^35 

and again ask for conditions for solvability given arbitrary P t , P 2 , P 3 . 
Eliminating x t between these two-at-a-time, and then x 2 from the resulting 
two equations leads us to the criterion for solvability that 

a il a 22 a 33 + a 12 a 23 a 31 + a 13 a 21 a 32 — a 12 a 21 a 33 

~ a lia23«32 - a 13 a 22 a 31 ¥= 0. 
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Using these two as models (and with the hindsight that all this will work) 
we shall make the broad jump to the general case and shall define the de¬ 
terminant of an arbitrary n x n matrix over F. But first a little notation! 

Let S n be the symmetric group of degree n; we consider elements in S n 
to be acting on the set {1,2,..., n }. For a e S n , a{i) will denote the image 
of i under a. (We switch notation, writing the permutation as acting from 
the left rather than, as previously, from the right. We do so to facilitate 
writing subscripts.) The symbol (— l)* 7 for <j £ S n will mean +1 if <7 is an 
even permutation and — 1 if a is an odd permutation. 

DEFINITION If A = (a^-) then the determinant of A, written det A, is the 
element £ ffeSn {-^Y^i a ^ 2 a( 2 ) ' ' ' a no(n) ln F - 

We shall at times use the notation 

a n ’ ’ ’ a i n 
&nl " " 

for the determinant of the matrix 



Note that the determinant of a matrix A is the sum (neglecting, for the 
moment, signs) of all possible products of entries of A, one entry taken 
from each row and column of A. In general, it is a messy job to expand the 
determinant of a matrix—after all there are n ! terms in the expansion—but 
for at least one type of matrix we can do this expansion visually, namely, 

LEMMA 6.9.1 The determinant of a triangular matrix is the product of its 
entries on the main diagonal. 

Proof. Being triangular implies two possibilities, namely, either all the 
elements above the main diagonal are 0 or all the elements below the main 
diagonal are 0. We prove the result for A of the form 



and indicate the slight change in argument for the other kind of triangular 
matrices. 

Since oc u = 0 unless i = 1, in the expansion of det A the only nonzero 
contribution comes in those terms where <r(l) = 1. Thus, since a is a 
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permutation, <r(2) * 1; however, if <j(2) > 2, a 2<7(2) = 0, thus to get a 
nonzero contribution to det A, a(2) = 2. Continuing in this way, we must 
have o(i) = i for all i, which is to say, in the expansion of det A the only 
nonzero term arises when o is the identity element of S n . Hence the sum of 
the n\ terms reduces to just one term, namely, a„a 22 • • • a„„, which is the 
contention of the lemma. 

If A is lower triangular we start at the opposite end, proving that for a 
nonzero contribution o(n) = n, then a(n - 1) = n - 1, etc. 

Some special cases are of interest: 

1. If 



is diagonal, det A = k t k 2 • • • k n . 

2. If 



the identity matrix, then det A = 1. 

3. If 



l the scalar matrix, then det A = k". 

y m Note also that if « row (or column) of a matrix consists of Vs then the determinant 
: w0, for each term of the expansion of the determinant would be a product 
l in which one element, at least, is 0, hence each term is 0. 

Given the matrix A = (ot^) in F n we can consider its first row v j = 
( a ii? «i 2 s • • • s a i«) as a vector in F M ; similarly, for its second row, v 2 , and 
the others. We then can consider det A as a function of the n vectors 
®i> • • • , v n- Many results are most succinctly stated in these terms so we 
■' s ^ a h often consider det A = d(v 1 , , v n ) ; in this the notation is always 

meant to imply that v j is the first row, v 2 the second, and so on, of A. 

One further remark: Although we are working over a field, we could just 
' as eas hy assume that we are working over a commutative ring, except in 
the obvious places where we divide by elements. This remark will only 
enter when we discuss determinants of matrices having polynomial entries, 
a little later in the section. 
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LEMMA 6.9.2 If A e F„and y e F then d(v lf . . . ,v i _ l , yv t ,v i + l ,. . . ,v„) = 

yd{pi j • • • 3 v i -13 v ii v i + 1? • • • 3 v n ). 


Note that the lemma says that if all the elements in one row of A are 
multiplied by a fixed element y in F then the determinant of A is itself 
multiplied by y. 

Proof. Since only the entries in the zth row are changed, the expansion 

of </(»!, , »j_ 1, Fi, »j + l, • • - 3 »„) is 

( — l) ff a lff(1 ) • ' ' —l)(y a i>(i)) a i + l><K» + l) a nt7(n)J 

oGS n 

since this equals y ^aeSr, (” ' ' ' a «>(»)"' ' a Mn)’ ^ does indeed 

equal yd(v l , ..., v„). 

LEMMA 6.9.3 

d(v t , . . . , »,■_!, y,-, »i + l, + ^(^13 • • ■ 3 & (-l3 M i3 y i + l3- • • 3 O 

= d(v if ... 3 —13 Vi "f M f3 y i + l3 • • • 3 y n)' 


Before proving the result, let us see what it says and what it does not say. 
It does not say that det A + det B = det (A + B ); this is false as is mani¬ 


fest in the example 




where det A - det B = 0 while det (A + B) = 1. It does say that if A 
and B are matrices equal everywhere but in the fih row then the new matrix 
obtained from A and B by using all the rows of A except the tth, and using 
as fih row the sum of the fih row of A and the tth row of B, has a deter¬ 
minant equal to det A + det B. If 


then 

det A 


A = 




— 2, det B = 



— 1 = det A + det B. 


Proof. If rq = (a ll3 ..., a ln ), ..., zq = (a a , ..., a jn ), . • •, v n - 
(a Bl , ..., O and if u t = (/? a ,. . ., then 


d{p 13 • • • } —13 Uf + Vf, Vi + 1 3 . • • 3 ^n) 

= ^ ; ( 1) — 1 ,er(i- l)(^i<r(i) "f" i))^i + 1 ,<r(i + 1 ) &na(n) 

a G S„ 

= (— 1 ) <T ° £ 1<t( 1) * *' a i — 1 ,«t( 1 — 1 ) a I er(i)' ' ' a nt7(n) 

oGS n 

“f ^ ' ( 1) Gfi<r(l)* * ’ — — 1 )Pio(i) &it<r(n) 

oGS n 

= + d(v i, . . . , M £ , . . ■ , V„). 
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The properties embodied in Lemmas 6.9.1, 6.9.2, and 6.9.3, along with 
that in the next lemma, can be shown to characterize the determinant 
function (see Problem 13, end of this section). Thus, the formal property 
exhibited m the next lemma is basic m the theory of determinants. 

LEMMA 6.9.4 If two rows of A are equal (that is, v r = v s for r # j), then 
det A = 0. 


| Proof. Let A — (oc,*j) and suppose that for some r, s where r ^ s, 
I U-rj = a sj f° r ah j- Consider the expansion 

(■ 

' dd A — ( — 1) a l<r(l)‘ ‘ ‘ a r<r(r) - ‘ ‘ a s<x(s)‘ ‘ ‘ a /i<r(n)- 

. a 6 s„ 

I In the expansion we pair the terms as follows: For a eS n we pair the term 
I (~ 1) a i<x(i)‘' ’ a n<T(n) w hh the term (— l) T<T a lr<T(1) -• • a nT(T(n) where t is 
| the transposition (a(r), Since t is a transposition and x 2 = 1, this 

| indeed gives us a pairing. However, since a r<T(r ) = a s<T(r) , by assumption, 
I and °W) = a sx a (sp we have that ct ra(r) = a ST(T(s) . Similarly, a S(T(s) = 
| a rto(r)- On the other hand, for i ^ r and i ± s, since xo(i) = cr(i), 

I *<»(<) = a fc»(0- Thus the terms «i ff (i )'"«■„„(„) and a ltff(1) • • • a ntff(n) are 
I equal. The first occurs with the sign (-1)" and the second with the sign 
I (“~i) t<T in the expansion of det A. Since x is a transposition and so an 
I odd permutation, (-1) T<T = — ( — 1)*. Therefore in the pairing, the paired 
l terms cancel each other out in the sum, whence det ^4 = 0. (The proof 
I hoes not depend on the characteristic of F and holds equally well even in 
i the case of characteristic 2.) 

| From the results so far obtained we can determine the effect, on a de- 
| terminant of a given matrix, of a given permutation of its rows. 


LEMMA 6.9.5 Interchanging two rows of A changes the sign of its determinant. 

Proof. Since two rows are equal, by Lemma 6.9.4, d{v u . .., v i _ 1 , 

v i + v p v i + 1 ? • • • 5 v j -15 v i + Vj, Vj+ 1 , •••,»„) =0. Using Lemma 6.9.3 
several times, we can expand this to obtain d(v t ,. . ., y._ l5 v t ,.. . 4 , Vj_ t , 
V P ■ • -5 v n) + d(v j,.. ., j, Vj ,..., v { ,.. ., v n ) + d(v t ,..., y t -_ l5 v ( , 

°J- 1’ V i ’•••’ V n ) + d(v u ..., Vi_ t , Vj,..., Vj_ u Vj,..., V„) = 0. 
However, each of the last two terms has in it two equal rows, whence, by 
Lemma 6.9.4, each is 0. The above relation then reduces to d(v t ,. .., v ( _ t , 

y ’ °j- 1’ °J> ■ • • ’ v n) + d(v t , . . ., y f _j, Vj, . . ., Vj_ t , v t , . . ., v„) = 0, 

Which is precisely the assertion of the lemma. 


COROLLARY If the matrix B is obtained from A by a permutation of the rows 
of A then det A = + det B, the sign being + 1 if the permutation is even, — 1 
tf the permutation is odd. 
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We are now in a position to collect pieces to prove the basic algebraic 
property of the determinant function, namely, that it preserves products. 
As a homomorphism of the multiplicative structure of F n into F the de¬ 
terminant will acquire certain important characteristics. 

THEOREM 6.9.1 For A, B e F n , det (AB) = (det A) (det B ). 

Proof. Let A = (a y ) and B = (j8 y ); let the rows of B be the vectors 
iq, u 2 , . • •, w„. We introduce the n vectors w u . . . ,w n as follows: 

w l = Ct ll u 1 + a l2 U 2 + • • • + «1 n U n> 

W 2 — 0C 2 T 0C 22 W 2 T ' ' ’ T Cl 2 „M n , 

= ®nl^l T" & n 2^2 T * ‘ * T CC nn ll n . 

Consider d(w l ,. . ., w n ); expanding this out and making many uses of 
Lemmas 6.9.2 and 6.9.3, we obtain 

d(w lf ..., w n ) = ^ <*!,-, a 2 ,- 2 * ‘ ‘ a n i„^( M iV M « 2 ’ • • • > u 0‘ 

11,12, — ,in 

In this multiple sum q,.. ., i n run independently from 1 to n. However, if 
any two i r = i 5 then u ir = u is whence d(u h ,.. ., u ir ,.. ., . . ., u ln ) = 0 

by Lemma 6.9.4. In other words, the only terms in the sum that may give a 
nonzero contribution are those for which all of i i , i 2 ,.. ., i„ are distinct, 
that is for which the mapping 

2 - n ) 

\*1 l 2 • ' ' V 

is a permutation of 1,2, ...,«. Also any such permutation is possible. 
Finally note that by the corollary to Lemma 6.9.5, when 

2 ■ ■) 

\h l 2 ‘ l n) 

is a permutation, then d(u it , u i2 ,.. ., u in ) = ( — l) a d(u l ,...,u n ) = 
( — l)* 7 det 5. Thus we get 

d(w ly ...,w n ) = a 1(7(1) --•«„<,(")( “I)" det 5 

<jES„ 

= (det B) ^ 1 ( 1) 0tl(7(l) ‘ ‘ ‘ ®n<7(n) 

<7£S„ 

= (det 5) (det A). 

We now wish to identify d(w 1 ,...,w n ) as det (AB). However, since 

w l = a + • • • + OCi „M„, = a 21 M l + ' ' ’ + «2 n M n> • • ‘ > W n 

— a nl u l +■'*’+ a nn M n 
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we get that d(w ly ..., w„) is det C where the first row of C is w u the second 
is w 2 , etc. 

However, if we write out w ly in terms of coordinates we obtain 

W 1 = a ll M l + ” ‘ + &ln U n ~ a ll(Pm Pl2> • • • 3 Pin) 

+ ” * + ^lniPnli • • • 3 Pnn) 

— ( a llPll + a 12$21 + b Ct ln f} nl , a tl P 12 + • • • 

+ Ct lnPn2) • • • 3 ^llPln + * * ’ + U ln P nn ) 

which is the first row of AB. Similarly w 2 is the second row of AB, and so 
for the other rows. Thus we have C = AB. Since det (AB) = det C = 
d(w t ,. .., w„) = (det J)(det B ), we have proved the theorem. 

COROLLARY 1 If A is invertible then det A #0 and det(J -1 ) = 
(det A) -1 . 

Proof Since AA 1 = 1, det (AA 1 ) = det 1 = 1. Thus by the theorem, 
1 = det (AA *) = (det A) (det A *). This relation then states that 
det A ^0 and det A~ 1 = 1/det A. 

COROLLARY 2 If A is invertible then for all B, det (ABA~ 1 ) = det B. 

Proof. Using the theorem, as applied to (AB)A~ X , we get 
det ((AB)A 1 ) = det (AB) det (A 1 ) = det A det B det (A ~ 1 ). Invoking 
Corollary 1, we reduce this further to det B. Thus det (ABA~ l ) = det B. 

Corollary 2 allows us to define the determinant of a linear transformation. 
For, let Te A(V) and let m 1 (T) be the matrix of T in some basis of V. 
Given another basis, if m 2 (T) is the matrix of T in this second basis, then 
by Theorem 6.3.2, m 2 (T) = Cm l (T)C~ *, hence det ( m 2 (T)) = det(m 1 (7’)) 
by Corollary 2 above. That is, the matrix of T in any basis has the same 
determinant. Thus the definition: det T = det m 1 (T) is in fact independent 
of the basis and provides A(V) with a determinant function. 

In one of the earlier problems, it was the aim of the problem to prove that 
A', the transpose of A, is similar to A. Were this so (and it is), then A' and 
A, by Corollary 2, above would have the same determinant. Thus we should 
not be surprised that we can give a direct proof of this fact. 

LEMMA 6.9.6 det A = det (A'). 

Proof. Let A = (a tj ) and A' = (fl^); of course, = aj t . Now 

det A = X) ( — l) ffa l< 7 (l) ’ * ’ a n< 7 (n) 
aes n 

while 

det A = ( — 1) Plain ■'* Pna(n) = ( — 1) *<*<7(1)1 ‘ ' ’ a <j(n)n- 

aSS„ aeS„ 
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However, the term (— 1 ) <T oc <T (i)i ' ‘ ’ a <r(n)n e 9 ua ^ *° ( --0 a i<r ‘(i) 

of „ x (Prove!) But a and a~ 1 are of the same parity, that is, if <7 is odd, 

“ncr Mn)' ^ . _ i . rp, 

then so is a~ , whereas if a is even then a is even, lnus 

( — l) <7 a 1(T -i( 1 ) •' * a n(T -i(n) = (""O' 7 a i < r- 1 (i) ’ ’ ’ a na-Hny 
Finally as a runs over S n then a~ 1 runs over S n . Thus 

det A' = ^ 1 ( 1) ^1 (t _, (1 ) " " * ^ , na~ 1 (n) 

a -»e S n 

~ ^ ] ( 1) a l(r(l) ’ " ’ a no(n) 

aes„ 

= det A. 

In light of Lemma 6.9.6, interchanging the rows and columns of a matrix 
does not change its determinant. But then Lemmas 6.9.2— 6.9.5, which held 
for operations with rows of the matrix , hold equally for the columns of the same matrix. 

We make immediate use of the remark to derive Cramer’s rule for solving 
a system of linear equations. 

Given the system of linear equations 

an*i +••• + <*!„*„ = Pi 


<*„i x l + “ ’ + CC nn X n ~ 


Pn, 


we call A = (a t j) the matrix of the system and A = det A the determinant of 
the system. 

Suppose that A # 0; that is, 


A = 


a„, • • • a. 


± 0. 


By Lemma 6.9.2 (as modified for columns instead of rows), 


X;A — 


Clu X i a ln 

<X n i*i 


However, as a consequence of Lemmas 6.9.3, 6.9.4, we can add any multiple 
of a column to another without changing the determinant (see Problem 5). 
Add to the ith column of x,A, x l times the first column, x 2 times the second, 
. . ., Xj times the jth column (for j ^ i). Thus 


x ; A = 


«u 

am 


(Xi i — i (.&ll X l + &12 X 2 + ’ ' ‘ + &ln X n) ^l,i + l 
a ni -1 faniX! + CC„2 X 2 + + “n/n) a n,i+l 


am 
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and using a kl x 1 + • • • + ct k „x„ = /? fe , we finally see that 

fn ”■ “i,i-i Pi <*1,1 + 1 <x ln 

: : : : : = A f , say. 

a ln a n,i-l Pn a n,i +1 ''' 

Hence, = A,/A. This is 

THEOREM 6.9.2 (Cramer’s Rule) If the determinant, A, of the system of 
linear equations 

?n*i + • • • + = Pi 

a nl x l + • • • + a„„* B = p n 

is different from 0, then the solution of the system is given by x t = A,/A, where 
Ai M" the determinant obtained from A by replacing in A the ith column by B, 

p 2 ,"-,Pn- 

Example The system 

*1 + 2x 2 + 3x 3 = — 5, 

2*j + x 2 + % = — 7, 

+ x 2 + % — 0) 

has determinant 

1 2 3 

A = 2 1 1 = 1 # 0, 

1 1 1 

hence 


-5 

2 

3 

1 

-5 

3 

1 

2 

-5 

-7 

1 

1 

2 

-7 

1 

2 

1 

-7 

0 

1 

1 

— , x 2 

1 

0 

1 

— , x % — 

1 

1 

0 


We can interrelate invertibility of a matrix (or linear transformation) 
with the value of its determinant. Thus the determinant provides us with a 
criterion for invertibility. 

THEOREM 6.9.3 A is invertible if and only if det A ^ 0. 

Proof. If A is invertible, we have seen, in Corollary 1 to Theorem 6.9.1, 
that det A ^ 0. 

Suppose, on the other hand, that det A ^ 0 where A = (a , .). By 
Cramer’s rule we can solve the system 

«ii*i + • • ’ + a ln x n = p t 

a ni x i + ’ ’ • + a= /?„ 
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for x u . . . , x n given arbitrary f} t , . . . , /?„. Thus, as a linear transformation 
on F^ n \ A' is onto; in fact the vector (jS l3 . . ., fi n ) is the image under A' of 

. Being onto, by Theorem 6.1.4, A' is invertible, hence A 

is invertible (Prove!). 

We can see Theorem 6.9.3 from an alternative, and possibly more in¬ 
teresting, point of view. Given A e F n we can embed it in K n where K is an 
extension of F chosen so that in K n , A can be brought to triangular form. 
Thus there is a B e K n such that 




here A l5 . . ., A„ are all the characteristic roots of A, each occurring as 
often as its multiplicity as a characteristic root of A. Thus det A = 
det (. BAB~ *) = A t A 2 ■ • • A„ by Lemma 6.9.1. However, A is invertible 
if and only if none of its characteristic roots is 0 ; but det A ^ 0 if and 
only if A } A 2 • • • A„ 7 ^ 0, that is to say, if no characteristic root of A is 0. 
Thus A is invertible if and only if det A ^ 0. 


This alternative argument has some advantages, for in carrying it out we 
actually proved a subresult interesting in its own right, namely, 


LEMMA 6.9.7 det A is the product, counting multiplicities, of the characteristic 
roots of A. 


DEFINITION Given A e F n , the secular equation of A is the polynomial 
det ( x — A) in F[x\. 


Usually what we have called the secular equation of A is called the 
characteristic polynomial of A. However, we have already defined the 
characteristic polynomial of A to be the product of its elementary divisors. 
It is a fact (see Problem 8) that the characteristic polynomial of A equals its secular 
equation, but since we did not want to develop this explicitly in the text, we 
have introduced the term secular equation. 

Let us compute and example. If 



"-U-G »K 



then 
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hence det (, - A) - (* - 1), - ( -2 )( -3) = ,> - , _ 6 . Thus thc 
secular equation of me 

'1 2' 

3 0 

is x 2, — x — 6. 

A few remarks about the secular equation: If A is a root of det (x - A) 
then det (A - A) = 0; hence by Theorem 6.9.3, A - A is not invertible.’ 
Thus A ,s a eharacteristic root of A. Conversely, if A is a characteristic root 
o tt\r A) not invertible, whence det (A - A) = 0 and so A is a root 

? 7 a' T \' m the expl,c “’ com P ut able polynomial, the secular 

f "a °w’ pmmdes US Wllh a f ol -> nomial wh ° se ore exactly the characteristic 
roots of A. We want to go one step further and to argue that a given root 

enters as a root of the secular equation precisely as often as it has multiplicity 
as a characteristic root of A. For if A, is the eharacteristic root of J with 
multiplicity m„ we can bring A to triangular form so that we have the 
matrix shown in Figure 6.9.1, where each A, appears on the diagonal m, 


l>, 


BAB - 1 = 


Figure 6.9.1 


times^ But as indicated by the matrix in Figure 6.9.2, det lx - A) = 
det (B(x - A)B-‘) = (* - A,)”"(x - A 2 )"» •••(*- A*)”* and so each 


B(x - A)B 1 = x - BAB~ 1 = 
lx — o 


x — 


X A* 


x X'j 


X Xix 


X Jit. 


Figure 6.9.2 
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whose multiplicity as a characteristic root of A is m t is a root of the poly¬ 
nomial det (x — A) of multiplicity exactly m v We have proved 

THEOREM 6.9.4 The characteristic roots of A are the roots , with the correct 
multiplicity , of the secular equation , det (x — A), of A. 

We finish the section with the significant and historic Cayley-Hamilton 
theorem. 

THEOREM 6.9.5 Every A e F n satisfies its secular equation. 

Proof. Given any invertible B e K n for any extension K of F, AeF 
and BAB -1 satisfy the same polynomials. Also, since det (x — BAB~ l ) = 
det ( B(x — A)B~ l ) = det (x — A), BAB~ l and A have the same secular 
equation. If we can show that some BAB 1 satisfies its secular equation, 
then it will follow that A does. But we can pick K => F and B e K n so 
that BAB~ l is triangular; in that case we have seen long ago (Theorem 
6.4.2) that a triangular matrix satisfies its secular equation. Thus the 
theorem is proved. 


Problems 


1. If F is the field of complex numbers, evaluate the following determi¬ 
nants : 


(a) 



i 

3 


(b) 


2 3 
5 6 
8 9 


(c) 


5 6 

4 3 

10 12 
1 2 


8 

0 

16 

3 



2. For what characteristics of F are the following determinants 0: 


(a) 


1 

3 

1 

2 


2 3 0 
2 1 0 
1 1 1 
4 5 6 


(b) 


4 

5 
3 


5 

3 ? 

4 


3. If A is a matrix with integer entries such that A 1 is also a matrix 
with integer entries, what can the values of det A possibly be? 

4. Prove that if you add the multiple of one row to another you do not 
change the value of the determinant. 

*5. Given the matrix A = (a y ) let A tj be the matrix obtained from A by 
removing the ith row and jth. column. Let M t j = ( — 1) I+J det A { j. 
M t j is called the cofactor of a t j. Prove that det A — a n M n + • • • + 
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6. (a) If A and B are square submatrices, prove that 

det (o 5) = ( det ^)( det B )- 

(b) Generalize part (a) to 



where each A i is a square submatrix. 

7. If C(f) is the companion matrix of the polynomial/(x), prove that 
the secular equation of C{f) is f(x). 

8. Using Problems 6 and 7, prove that the secular equation of A is its 
characteristic polynomial. (See Section 6.7; this proves the remark 
made earlier that the roots of p T {x ) occur with multiplicities equal to 
their multiplicities as characteristic roots of T.) 

9. Using Problem 8, give an alternative proof of the Cayley-Hamilton 
theorem. 

10. If F is the field of rational numbers, compute the secular equation, 
characteristic roots, and their multiplicities, of 


(a) 


( 0 1 0 0 \ 

0 0 0 1 I 
1 0 0 0 I ' 
0 0 10 / 


(b) 


2 

2 

4 



(c) 


( 4 1 1 

1 4 1 

1 1 4 

1 1 1 



11. For each matrix in Problem 10 verify by direct matrix computation 
that it satisfies its secular equation. 

*12. If the rank of A is r, prove that there is a square r x r submatrix of 
A of determinant different from 0, and if r < n, that there is no 
(r + 1) x (r + 1) submatrix of A with this property. 

*13. Let/ be a function on n variables from F (n) to F such that 

( a ) / (»i» • • • 5 v n) = 0 for v t = Vj g F (n) for i ^ j. 

(b) f (v 1 ,.. ., a v h .. ., v n ) = a f (v 1} . .., v n ) for each i, and a e F. 

( c ) /(»i» • • • > v i + U b v i + i> ■ ■ • 5 o n ) = f 5 »i-i, o h v i+1 ,..., v n ) 

"b f {Pli • • • 5 v i- 15 Mj, v i + l> • • ■ 5 v n)’ 

( d ) /(«i, • • •, e n ) = 1, where ^ = (1,0,..., 0), e 2 = (0, 1, 0, ..., 0), 

= (0,0,..., 0, 1). 

Prove that /(» l5 ...,»„) = det A for any A g F n , where v 1 is the 
first row of A, v 2 the second, etc. 

14. Use Problem 13 to prove that det A' = det A. 
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15. (a) Prove that AB and BA have the same secular (characteristic) 

equation. 

(b) Give an example where AB and BA do not have the same minimal 
polynomial. 

16. If A is triangular prove by a direct computation that A satisfies its 
secular equation. 

17. Use Cramer’s rule to compute the solutions, in the real field, of the 
systems 

(a.) x + y + z = 1, {b) x + y + z + w = \, 

2x + 3y + 4z = 1, x + 2y + 3z + 4w = 0, 

x - y - z = 0. x + y + 4z + 5w = 1, 

x + y + 5z + 6w = 0. 

18. (a) Let GL(n, F) be the set of all elements in F n whose determinant 

is different from 0. Prove GL(n, F) is a group under matrix 
multiplication. 

(b) Let D(n,F) = {A e GL{n, F) | det -4 = 1}. Prove that D(n, F) 
is a normal subgroup of GL(n, F). 

(c) Prove that GL(n, F)ID(n, F) is isomorphic to the group of non¬ 
zero elements of F under multiplication. 

19. If K be an extension field of F, let E(n,K,F) = {A e GL(n, K) \ 
det A e F}. 

(a) Prove that E(n, K, F ) is a normal subgroup of GL(n, K ). 

*(b) Determine GL(n, K)/E(n, K, F). 

*20. If F is the field of rational numbers, prove that when TV is a normal 
subgroup of D(2, F) then either N = D( 2, F) or N consists only of 
scalar matrices. 

6.10 Hermitian, Unitary, and Normal Transformations 

In our previous considerations about linear transformations, the specific 
nature of the field F has played a relatively insignificant role. When it did 
make itself felt it was usually in regard to the presence or absence of charac¬ 
teristic roots. Now, for the first time, we shall restrict the field F —generally 
it will be the field of complex numbers but at times it may be the field of 
real numbers—and we shall make heavy use of the properties of real and 
complex numbers. Unless explicitly'-stated, otherwise, in all of this section F will 
denote the field of complex numbers. 

We shall also be making extensive and constant use of the notions and 
results of Section 4.4 about inner product spaces. The reader would be 
well advised to review and to digest thoroughly that material before 
proceeding. 
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One further remark about the complex numbers: Until now we have 
managed to avoid using results that were not proved in the book. Now, 
however, we are forced to deviate from this policy and to call on a basic 
fact about the field of complex numbers, often known as “the fundamental 
theorem of algebra,” without establishing it ourselves. It displeases us to pull 
such a basic result out of the air, to state it as a fact, and then to make use 
of it. Unfortunately, it is essential for what follows and to digress to prove 
it here would take us too far afield. We hope that the majority of readers 
will have seen it proved in a course on complex variable theory. 

I FACT 1 A polynomial with coefficients which are complex numbers has all its 
roots in the complex field. 

Equivalently, Fact 1 can be stated in the form that the only nonconstant 
irreducible polynomials over the field of complex numbers are those of 
degree 1. 

FACT 2 The only irreducible, nonconstant, polynomials over the field of real 
H numbers are either of degree 1 or of degree 2. 

T he formula for the roots ol a quadratic equation allows us to prove easily 
the equivalence of Facts 1 and 2. 

| The immediate implication, for us, of Fact 1 will be that every linear 
transformation which we shall consider will have all its characteristic roots in the 
’ffeld of complex numbers. 

In what follows, V will be a finite-dimensional inner-product space over 
]F, the field of complex numbers; the inner product of two elements of V 
! will be written, as it was before, as ( v, w). 

I LEMMA 6.10.1 If TeA(V) is such that {vT, v) = 0 for all v e V, then 
*T = 0. 

Proof. Since (vT, v) = 0 for v e V, given u, w e V, ((u + w)T, u + w) = 
j|0. Expanding this out and making use of ( uT,u) = (wT,w) =0, we 
Ifobtain 

f 
I 


{uT, w) + (wT, u) = 0 for all u, w e V. 


( 1 ) 


Since equation (1) holds for arbitrary w in V, it still must hold if we 
|replace in it w by iw where i 2 = -1; but (kT, iw) = -i(uT, w) whereas 

f(iw)T, u) — i(wT, u). Substituting these values in (1) and canceling out i 
^eads us to 


— {uT, w) + (wT, u) = 0. 


( 2 ) 


Adding (1) and (2) we get (wT, u) = 0 for all u, w e V, whence, in 
[particular, (w7, wT) = 0. By the defining properties of an inner-product 
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space, this forces wT = 0 for all w g V, hence T = 0. (Note: If V is an 
inner-product space over the real field, the lemma may be false. For 
example, let V = {(a, f}) | a, j? real}, where the inner-product is the dot 
product. Let T be the linear transformation sending (a,/?) into (— /?, a). 
A simple check shows that ( vTv) = 0 for all v e V, yet T ^ 0.) 

DEFINITION The linear transformation T g A(V) is said to be unitary 
if (uT, vT) = (u, v) for all u, v e V. 

A unitary transformation is one which preserves all the structure of V, 
its addition, its multiplication by scalars and its inner product. Note that a 
unitary transformation preserves length for 

Ml = V(», ») = *J(vT,vT) = || 07*11. 

Is the converse true? The answer is provided us in 

LEMMA 6.10.2 If (vT, vT) — ( v , v) for all v e V then T is unitary. 

Proof. The proof is in the spirit of that of Lemma 6.10.1. Let u, v G V: 
by assumption ((u + v)T, (u + v)T) = (u + v,u + v). Expanding this 
out and simplifying, we obtain 

(uT, vT) + ( vT , uT) = («, v) + (v, u), (1) 

for u, v G V. In (1) replace v by iv ; computing the necessary parts, this yields 

-(uT, vT) + ( vT , uT) = -(«, v) + (v, u). (2) 

Adding (1) and (2) results in (uT, vT) = (u, v) for all u, v e V, hence 
T is unitary. 

We characterize the property of being unitary in terms of action on a 
basis of V. 

THEOREM 6.10.1 The linear transformation T on V is unitary if and only if 
it takes an orthonormal basis of V into an orthonormal basis of V. 

Proof. Suppose that {z> l5 . . ., v„] is an orthonormal basis of V; thus 
(v 0 Vj ) = 0 for i ^ j while (v t , v { ) = 1. We wish to show that if T is 
unitary, then {v t T,. . ., v n T } is also an orthonormal basis of V. But 
(v t T, VjT) = ( Vi ,Vj) = 0 for i and (v t T,v t T) = (v h v t ) = 1, thus 
indeed {v i T,, v n T } is an orthonormal basis of V. 

On the other hand, if T g A(V) is such that both {z> l5 • • • ? v„} an< ^ 
{v t T,. . ., v n T} are orthonormal bases of F, if u, w g V then 

n n 

u = a i V i’ W = E Pi V i ’ 

i = 1 i = 1 
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whence by the orthonormality of the p/s, 

n 

( u, w) = 22 a iPi- 

i= 1 

However, 

n n 

uT = 22 a Pi T and wT = 22 Pi v i T 

i= 1 i= 1 

f whence by the orthonormality of the v t T’ s, 

! 

(uT, wT) = ^ afli = (u, w), 
f i = 1 

proving that T is unitary. 

| Theorem 6.10.1 states that a change of basis from one orthonormal basis 
| to another is accomplished by a unitary linear transformation. 

I 

f,- 

| LEMMA 6.10.3 If TeA(V) then given any veV there exists an element 
\ we V, depending on v and T, such that (uT, v ) = (u, w) for all u e V. This 
j element w is uniquely determined by v and T. 

I E Proof. To prove the lemma, it is sufficient to exhibit a weV which 
works for all the elements of a basis of V. Let {u x ,..., u n } be an ortho¬ 
normal basis of V; we define 

| " _ 

= 22 ( u i T > v ) u i- 

| An easy computation shows that w) = {u t T, v) hence the element w 
I has the desired property. That w is unique can be seen as follows: Suppose 
| that (uT,v) = (u, w x ) = («, w 2 ) ; then (u, w x — w 2 ) = 0 for all ue-V 
| which forces, on putting u = w x — w 2 , w x — w 2 . 

| Lemma 6.10.3 allows us to make the 

I 

[' definition If T e A(V) then the Hermitian adjoint of T, written as T*, 
is defined by (uT, v) = ( u , vT*) for all u, v e V. 

■ Given v e V we have obtained above an explicit expression for vT* (as 
' to) and we could use this expression to prove the various desired properties 
; °f T*. However, we prefer to do it in a “basis-free” way. 

I 

& j 

[LEMMA 6.10.4 IfTeA(V) then T* e A(V). Moreover , 

[-1. (T*)* = T; 

• (S + T)* = S* + T*; 

• (A5)* = IS*; 

. (ST)* = T*S *; 

c or all S, T e A(V) and all Xe F. 
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Proof. We must first prove that T* is a linear transformation on V. If 
u, v, w are in V, then (u, (v + w)T*) = (uT, » + w) = (uT, v) + ( uT, w) = 

(: u , vT *) + (u, wT*) = (u, vT* + wT*), in consequence of which 
(v + w)T* = vT* + wT*. Similarly, for X e F, ( u , (Xv)T*) = ( uT , Xv) = 
l(uT, v) = X(u, vT *) = (u , A(yT*)), whence (Xv)T* = X{vT*). We have 

thus proved that T* is a linear transformation on 7. _ 

To see that (T*)* = T notice that (u, v{T*)*) = ( uT*,v) = ( v,uT*) = 

( vT~u) = (a, vT) for all u, v e V whence v{T*)* = vT which implies that 
( T *)* = T. We leave the proofs of (S + T)* = S* + T* and of (AT)* = 
AT* to the reader. Finally, (u, v(ST)*) = (uST , v) = (uS, vT*) = 
(u, vT*S*) for all u, v e V; this forces v(ST)* = vT*S* for every veV 
which results in (ST)* = T*S*. 

As a consequence of the lemma the Hermitian adjoint defines an adjoint, 
in the sense of Section 6.8, on A(V). 

The Hermitian adjoint allows us to give an alternative description for 
unitary transformations in terms of the relation of T and T*. 

LEMMA 6.10.5 T g A(V) is unitary if and only if TT* = 1. 

Proof. If T is unitary, then for all u, v e V, ( u , vTT *) = (uT, vT) — 
(u,v) hence TT* = 1. On the other hand, if TT* = 1, then (u, v) = 
(u, vTT*) = (uT, vT ), which implies that T is unitary. 

Note that a unitary transformation is nonsingular and its inverse is just 
its Hermitian adjoint. Note, too, that from TT* = 1 we must have that 
T* T = 1. We shall soon give an explicit matrix criterion that a linear 
transformation be unitary. 

THEOREM 6.10.2 If {v lt . ■ ■ ,v„) is an orthonormal basis of V and if the 
matrix of Te A(V) in this basis is (a fj -) then the matrix of T* in this basis is 
(jBy), where p tj = a Jt . 

Proof. Since the matrices of T and T* in this basis are, respectively, 
(a ij) and (jS y ), then 

n « 

v{T = UijVj and v t T* = ^ PtjVj. 
i= 1 i=l 

Now 

Pij = i v i T *> v j) = ( v i’ v i T ) = 2 aj ' fcyfc ) = * Jt 

by the orthonormality of the y/s. This proves the theorem. 

This theorem is very interesting to us in light of what we did earlier in 
Section 6.8. For the abstract Hermitian adjoint defined on the inner-product 
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space V, when translated into matrices in an orthonormal basis of V, becomes 
nothing more than the explicit, concrete Hermitian adjoint we defined 
there for matrices. 

Using the matrix representation in an orthonormal basis, we claim that 
T e A(V) is unitary if and only if, whenever (a (J ) is the matrix of T in this 
orthonormal basis, then 

n 

2 Uifiik = 0 for j ^ k 

1=1 

while 

2 K\ 2 = L 

i= 1 

In terms of dot products on complex vector spaces, it says that the rows of 
the matrix of T form an orthonormal set of vectors in F (n) under the dot 
product. 


DEFINITION Te A(V) is called self-adjoint or Hermitian if T* = T. 


If T* — — T we call skew-Hermitian. Given any S e A(V), 

'S - ,s* N 


c S + S* . 

o — - + l 


2 i 


and since (S + S*)j 2 and (S — S*)/2i are Hermitian, S = A + iB where 
both A and B are Hermitian. 

In Section 6.8, using matrix calculations, we proved that any complex 
characteristic root of a Hermitian matrix is real; in light of Fact 1, this can 
be changed to read: Every characteristic root of a Hermitian matrix is real. 
We now re-prove this from the more uniform point of view of an inner- 
product space. 


THEOREM 6.10.3 If T eA(V) is Hermitian, then all its characteristic roots 
are real. 

Proof. Let X be a characteristic root of T; thus there is a v =X 0 in V 
such that vT — Xv. We compute: X(v, v) = (Xv, v ) = ( vT , v ) = ( v , vT*) = 
(v, vT) = ( v , Xv) = X(v, v); since ( v, v) # 0 we are left with X = X hence 
X is real. 

We want to describe canonical forms for unitary, Hermitian, and even 
more general types of linear transformations which will be even simpler 
than the Jordan form. This accounts for the next few lemmas which, 
although of independent interest, are for the most part somewhat technical 
in nature. 
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LEMMA 6.10.6 If S £ A(V ) and if vSS* = 0, then vS = 0. 

Proof. Consider ( vSS *, v); since vSS* = 0, 0 = ( vSS*, v ) = ( vS , v(S*)*) = 
(vS, vS) by Lemma 6.10.4. In an inner-product space, this implies that 
vS = 0. 


COROLLARY If T is Hermitian and vT k = 0 for k > 1 then vT = 0. 

Proof. We show that if vT 2m = 0 then vT = 0; for if S = T 2m \ then 
S* = S and SS* = T 2m , whence ( vSS *, v) = 0 implies that 0 = vS = 
vT 2 ™ 1 . Continuing down in this way, we obtain vT — 0. If vT k = 0, 
then vT 2m = 0 for 2 m > k, hence vT = 0. 

We introduce a class of linear transformations which contains, as special 
cases, the unitary, Hermitian and skew-Hermitian transformations. 

DEFINITION TeA(V) is said to be normal if TT* = T*T. 

Instead of proving the theorems to follow for unitary and Hermitian 
transformations separately, we shall, instead, prove them for normal linear 
transformations and derive, as corollaries, the desired results for the unitary 
and Hermitian ones. 

LEMMA 6.10.7 If N is a normal linear transformation and if vN = 0 for 
v e V, then vN* = 0. 

Proof. Consider ( vN *, vN*); by definition, ( vN *, vN *) = ( vN*N, v) = 

(vNN *, v), since NN* = N*N. However, vN = 0, whence, certainly, 
vNN* = 0. In this way we obtain that ( vN *, vN *) = 0, forcing vN* = 0. 

COROLLARY 1 If X is a characteristic root of the normal transformation N 
and if vN = kv then vN* = lv. 

Proof. Since N is normal, NN* = N*N, therefore, (N — k)(N — k)* = 
(N - k)(N* - k) = NN* - kN* - IN + kk = N*N - kN* - IN + 
kk — ( N * — k)(N — k) = (N — k)*(N — k), that is to say, N — k is 
normal. Since v(N — k) = 0 by the normality of N — k, from the lemma, 
v(N — A)* =0, hence vN* = Iv. 

The corollary states the interesting fact that if k is a characteristic root of 
the normal transformation N not only is 1 a characteristic root of N* but 
any characteristic vector of N belonging to k is a characteristic vector of 
N* belonging to 1 and vice versa. 

COROLLARY 2 If T is unitary and if k is a characteristic root of T, then 

W = 1- 




Sec. 6.10 Hermitian, Unitary, and Normal Transformations 


343 


Proof. Since T is unitary it is normal. Let X be a characteristic root of 
T and suppose that vT = Xv with v # 0 in V. By Corollary 1, vT* = Xv, 
thus v = vTT* = XvT* = Xlv since TT* = 1. Thus we get Xl= 1, 
which, of course, says that |/l| = 1. 

We pause to see where we are going. Our immediate goal is to prove that 
a normal transformation N can be brought to diagonal form by a unitary 
one. If X u . . ., X k are the distinct characteristic roots of V, using Theorem 
6.6.1 we can decompose V as V = V t © • • • © V k , where for v t e V h 
v t (N — X ,•)"* = 0. Accordingly, we want to study two things, namely, the 
relation of vectors lying in different V/s and the very nature of each V t . 
When these have been determined, we will be able to assemble them to 
prove the desired theorem. 

LEMMA 6.10.8 If N is normal and if vN k = 0, then vN = 0. 

Proof. Let S = NN*; S is Hermitian, and by the normality of N, 
vS k = v(NN*) k = vN k (N*) k = 0. By the corollary to Lemma 6.10.6, we 
deduce that vS = 0, that is to say, vNN* = 0. Invoking Lemma 6.10.6 
itself yields vN = 0. 

COROLLARY If N is normal and if for X e F, v(N — X) k — 0, then 
vN = Xv. 

Proof. From the normality of Ait follows that N — X is normal, whence 
by applying the lemma just proved to N — X we obtain the corollary. 

In line with the discussion just preceding the last lemma, this corollary 
shows that every vector in V i is a characteristic vector of N belonging to the charac¬ 
teristic root X t . We have determined the nature of V t ; now we proceed to 
investigate the interrelation between two distinct Fj’s. 

LEMMA 6.10.9 Let N be a normal transformation and suppose that X and 
H are two distinct characteristic roots of N. If v, w are in V and are such that 
vN = Xv, wN — fiw, then (v, w ) = 0. 

Proof. We compute (vN, w ) in two different ways. As a consequence 
of vN — Xv, (vN, w ) = ( Xv, w) = X{v, w). From wN = fiw, using Lemma 
6.10.7 we obtain that wN* = p.w, whence (vN, w ) = {v, wN*) = {v, jiw) = 
H(v, w). Comparing the two computations gives us X{v, w) — n(y, w) and 
since X ^ fj., this results in ( v, w ) = 0. 

All the background work has been done to enable us to prove the basic 
and lovely 
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THEOREM 6.10.4 If N is a normal linear transformation on V, then there exists 
an orthonormal basis, consisting of characteristic vectors of N, in which the matrix of 
N is diagonal. Equivalently, if N is a normal matrix there exists a unitary matrix 
U such that UNU ~ 1 (= UNU *) is diagonal. 

Proof. We fill in the informal sketch we have made of the proof prior 
to proving Lemma 6.10.8. 

Let N be normal and let A 1} . . . , A k be the distinct characteristic roots 
of N. By the corollary to Theorem 6.6.1 we can decompose V = 
V 1 © • • • © V k where every v t e V t is annihilated by (N — A,)" 1 . By the 
corollary to Lemma 6.10.8, V t consists only of characteristic vectors of N 
belonging to the characteristic root X t The inner product of V induces an 
inner product on V i ; by Theorem 4.4.2 we can find a basis of V { orthonormal 
relative to this inner product. 

By Lemma 6.10.9 elements lying in distinct V/s are Orthogonal. Thus 
putting together the orthonormal bases of the Fj’s provides us with an 
orthonormal basis of V. This basis consists of characteristic vectors of N, 
hence in this basis the matrix of N is diagonal. 

We do not prove the matrix equivalent, leaving it as a problem; we only 
point out that two facts are needed: 

1. A change of basis from one orthonormal basis to another is accomplished 
by a unitary transformation (Theorem 6.10.1). 

2. In a change of basis the matrix of a linear transformation is changed 
by conjugating by the matrix of the change of basis (Theorem 6.3.2). 

Both corollaries to follow are very special cases of Theorem 6.10.4, but 
since each is so important in its own right we list them as corollaries in order 
to emphasize them. 

COROLLARY 1 If T is a unitary transformation, then there is an orthonormal 
basis in which the matrix of T is diagonal; equivalently, if T is a unitary matrix, 
then there is a unitary matrix U such that UTU~ 1 (= UTU *) is diagonal. 

COROLLARY 2 If T is a Hermitian linear transformation, then there exists an 
orthonormal basis in which the matrix of T is diagonal; equivalently, if T is a Hermitian 
matrix, then there exists a unitary matrix U such that UTU -1 (= UTU*) is 
diagonal. 

The theorem proved is the basic<result for normal transformations, for it 
sharply characterizes them as precisely those transformations which can 
be brought to diagonal form by unitary ones. It also shows that the distinc¬ 
tion between normal, Hermitian, and unitary transformations is merely a 
distinction caused by the nature of their characteristic roots. This is made 
precise in 
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LEMMA 6.10.10 The normal transformation N is 

1. Hermitian if and only if its characteristic roots are real. 

2. Unitary if and only if its characteristic roots are all of absolute value 1. 

Proof. We argue using matrices. If N is Hermitian, then it is normal and 
all its characteristic roots are real. If iVis normal and has only real charac¬ 
teristic roots, then for some unitary matrix U, UNU ~ 1 = UNU* = D 
where D is a diagonal matrix with real entries on the diagonal. Thus 
D* = D; since D* = (UNU*)* = UN*U*, the relation D* = D implies 
UN*U* = UNU*, and since U is invertible we obtain N* = N. Thus N 
is Hermitian. 

We leave the proof of the part about unitary transformations to the reader. 

If A 1S an Y linear transformation on V, then tr (AA*) can be computed 
by using the matrix representation of A in any basis of V. We pick an 
orthonormal basis of V; in this basis, if the matrix of A is (af then that of 
A* is (fi tJ ) where p tJ = o^... A simple computation then shows that 
tr (AA*) — \ccij\ 2 and this is 0 if and only if each oc t j = 0, that is, if 
and only if A = 0. In a word, tr (AA*) = 0 if and only if A = 0. This is a 
useful criterion for showing that a given linear transformation is 0. This 
is illustrated in 


LEMMA 6.10.11 If N is normal and AN = NA, then AN* = N*A. 

Proof. We want to show that X = AN* - N*A is 0; what we shall 
do is prove that tr XX* = 0, and deduce from this that X = 0. 

Since N commutes with A and with N*, it must commute with AN*^ — 
N*A, thus XX* = (AN* - N*A)(NA* - A*N) = (AN* - N*A)NA* - 
(AN* - N*A)A*N = N{(AN* - N*A)A*j - {(AN* - N*A)A*}N 
Being of the form NB - BN, the trace of AW* is 0. Thus X = 0 and 
AN* = N*A. 


We have just seen that N* commutes with all the linear transformations 
that commute with N, when N is normal; this is enough to force N* to be a 
polynomial expression in N. However, this can be shown directly as a 
consequence of Theorem 6.10.4 (see Problem 14). 

The linear transformation T is Hermitian if and only if (vT, v) is real 
for every v e V. (See Problem 19.) Of special interest are those Hermitian 
near transformations for which (vT, v) > 0 for all v e V. We call these 
nonnegative linear transformations and denote the fact that a linear trans¬ 
formation is nonnegative by writing T > 0. If T > 0 and in addition 
f ’ y ) > 0 for o ^ 0 then we call T positive (or positive definite) and write 
> 0. We wish to distinguish these linear transformations by their charac¬ 
teristic roots. 
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LEMMA 6.10.12 The Hermitian linear transformation T is nonnegative 
{positive) if and only if all of its characteristic roots are nonnegative {positive). 

Proof. Suppose that T > 0; if X is a characteristic root of T, then 
vT = Xv for some v ^ 0. Thus 0 < {vT,v) = {Xv ,v) = X{v, v) ; since 
{v, v) > 0 we deduce that X > 0. 

Conversely, if T is Hermitian with nonnegative characteristic roots, then 
we can find an orthonormal basis (tq, . . ., v n ) consisting of characteristic 
vectors of T. For each v t , v t T = AjZq, where X t > 0. Given v e V, 
v = hence vT = '£a. i v i T = But (vT, v) = (SA f a f y f , 

= by the orthonormality of the v/s. Since X t > 0 and oqoq > 0, 

we get that {vT, v) > 0 hence T > 0. 

The corresponding “positive” results are left as an exercise. 

LEMMA 6.10.13 T > 0 if and only if T = A A* for some A. 

Proof. We first show that AA* > 0. Given v e V, {vAA*, v) = 
{vA, vA) > 0, hence AA* > 0. 

On the other hand, if T > 0 we can find a unitary matrix U such that 

/ Al \ 

UTU* = 

\ K) 

where each X t is a characteristic root of T, hence each X t > 0. Let 



since each X t > 0, each V X t is real, whence S is Hermitian. Therefore, 
U*SU is Hermitian; but 

At \ 

{U*SU) 2 = U*S 2 U = U* I • •. \u=T. 

\ Kl 

We have represented T in the form AA*, where A = U*SU. 

Notice that we have actually proved a little more; namely, if in construct¬ 
ing S above, we had chosen the nonnegative yjX t for each X i} then S, and 
U*SU, would have been nonnegative. Thus T > 0 is the square of a non¬ 
negative linear transformation; that is, every T > 0 has a nonnegative 
square root. This nonnegative square root can be shown to be unique (see 
Problem 24). 

We close this section with a discussion of unitary and Hermitian matrices 
over the real field. In this case, the unitary matrices are called orthogonal, and 
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Let Q be a real 2x2 orthogonal matrix satisfying Q 2 - XQ + 1 - 0; 
suppose that Q = ^ • The orthogonality of Q implies 

a 2 + p 2 = 1; (*) 

y 2 + <5 2 = 1; (2) 

ay + fid = 0; (3) 

since Q 2 — XQ + 1 = 0, the determinant of Q is 1, hence 

a<S — /?y = 1. W 


We claim that equations (1)— (4-) imply that a — d, P — y. Since 
a 2 _j_ p 2 _ ^ | a | < whence we can write a = cos 9 for some real angle 
9; in these terms 0 = sin 9. Therefore, the matrix Q looks like 


( cos 9 
— sin 9 


sin 9\ 
cos 9J 


All the spaces used in all our decompositions were mutually orthogonal, 
thus by picking orthogonal bases of each of these we obtain an orthonormal 
basis of V. In this basis the matrix of Q is as shown in Figure 6.10.1. 




cos 9 t sin 9 t 
— sin 9 t cos 


Figure 6.10.1 


cos 9 r sin 9 r 
— sin 9 r cos 9 r 


Since we have gone from one orthonormal basis to another, and since 
this is accomplished by an orthogonal matrix, given a real orthogonal 
matrix Q we can find an orthogonal matrix T such that TQT 1 (= TQT*) ^ 
of the form just described. 
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Problems 


1. 


2 . 

3. 

4. 

5. 

6 . 

7. 

8 . 
9. 

10 . 

11 . 

12 . 

13 . 


Determine which of the following matrices are unitary, Hermitian, 
normal. 



/! 0 
I 0 0 
0 1 
\0 0 


0 

1 

0 

0 



0 

1 

y/2 

1 

y/2 


1] 

V2 



For those matrices in Problem 1 which are normal, find their charac¬ 
teristic roots and bring them to diagonal form by a unitary matrix. 

If T is unitary, just using the definition (vT, uT) = (v, u), prove 
that T is nonsingular. 

If Q is a real orthogonal matrix, prove that det Q = + 1. 

If Q is a real symmetric matrix satisfying Q k = 1 for k > 1, prove 
that Q 2 = 1. 

Complete the proof of Lemma 6.10.4 by showing that (S + T)* = 
S* + T* and (XT)* = XT*. 

Prove the properties of * in Lemma 6.10.4 by making use of the explicit 
form of w — vT* given in the proof of Lemma 6.10.3. 

If T is skew-Hermitian, prove that all of its characteristic roots are 
pure imaginaries. 

If T is a real, skew-symmetric n X n matrix, prove that if n is odd, 
then det T = 0. 


By a direct matrix calculation, prove that a real, 2x2 symmetric 
matrix can be brought to diagonal form by an orthogonal one. 

Complete the proof outlined for the matrix-equivalent part of Theorem 


6.10.4. 


Prove that a normal transformation is unitary if and only if the charac¬ 
teristic roots are all of absolute value 1. 

If N t , . . ., N k is a finite number of commuting normal transformations, 
prove that there exists a unitary transformation T such that all of 
TN t T~ 1 are diagonal. 
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14. If N is normal, prove that N* = p(N) for some polynomial p(x). 

15. If N is normal and if AN = 0, prove that AN* = 0. 

16. Prove that A is normal if and only if A commutes with AA*. 

17. If N is normal prove that N = £A i E i where E* = E h E* = E h 
and the A/s are the characteristic roots of N. (This is called the spectral 
resolution of N.) 

18. If N is a normal transformation on V and if f (x) and g(x) are two 
relatively prime polynomials with real coefficients, prove that if 
vf (N) =0 and wg(N ) = 0, for v, w in V, then (v, w) = 0. 

19. Prove that a linear transformation T on V is Hermitian if and only if 
(. vT , v) is real for all v e V. 

20. Prove that T > 0 if and only if T is Hermitian and has all its charac¬ 
teristic roots positive. 

21. If A > 0 and (vA, v) = 0, prove that vA = 0. 

22. (a) If A > 0 and A 2 commutes with the Hermitian transformation 

B then A commutes with B. 

(b) Prove part (a) even if B is not Hermitian. 

23. If A > 0 and B > 0 and AB — BA, prove that AB > 0. 

24. Prove that if A > 0 then A has a unique nonnegative square root. 

25. Let A = (ay) be a real, symmetric n x n matrix. Let 



(a) If A >0, prove that A s > 0 for s = 1,2 

(b) If ,4 > 0 prove that det A s > 0 for s = 1,2,...,«. 

(c) If det A s > 0 for s = 1, 2,. .., n, prove that A > 0. 

(d) If A > 0 prove that A s > 0 for s = 1, 2,..., n. 

(e) If A > 0 prove that det A s > 0 for s = 1, 2,..., n. 

(f) Give an example of an A such that det A s > 0 for all s = 1,2, 
. . ., n yet A is not nonnegative. 

26. Prove that any complex matrix can be brought to triangular form 
by a unitary matrix. 

\ 

6.11 Real Quadratic Forms 

We close the chapter with a brief discussion of quadratic forms over the 
field of real numbers. 

Let V be a real, inner-product space and suppose that A is a (real) sym- 
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metric linear transformation on V. The real-valued function Q (v) defined 
on V by Q (v) = (vA, v) is called the quadratic form associated with A. 

If we consider, as we may without loss of generality, that A is a real, 
n x n symmetric matrix (oL tj ) acting on F {n) and that the inner product for 
(<5i, . . ., <5„) and (y l3 . . ., y„) in F (n) is the real number <5^ + d 2 y 2 + • • • 

+ for an arbitrary vector v = (x l3 . . ., *„) in F (n) a simple calcula¬ 

tion shows that 

Q(v) = ( vA , v) = + • • • + cc nn x n 2 + 2 X) <*ij x i x j- 

i<j 

On the other hand, given any quadratic function in n-variables 
yn x \ 2 + • • • + y„„x n 2 + 2 ^2 yijXiXp 

i<j 

with real coefficients y t j, we clearly can realize it as the quadratic form 
associated with the real symmetric matrix C = (y^.). 

In real n-dimensional Euclidean space such quadratic functions serve to 
define the quadratic surfaces. For instance, in the real plane, the form 
ax 2 + fixy + yy 2 gives rise to a conic section (possibly with its major axis 
tilted). It is not too unnatural to expect that the geometric properties of 
this conic section should be intimately related with the symmetric matrix 

( oc jS/2\ 

y r 

with which its quadratic form is associated. 

Let us recall that in elementary analytic geometry one proves that by a 
suitable rotation of axes the equation ax 2 + fixy + yy 2 can, in the new 
coordinate system, assume the form (X-fx ') 2 + yfy') 2 . Recall that 
a i + Ti = a + y an d ay — /? 2 /4 = a^. Thus a l5 yj are the charac¬ 
teristic roots of the matrix 

( « j&/2\ 

W y )’ 

the rotation of axes is just a change of basis by an orthogonal transformation, 
and what we did in the geometry was merely to bring the symmetric matrix 
to its diagonal form by an orthogonal matrix. The nature of ax 2 + fixy + 
yy 2 as a conic was basically determined by the size and sign of its charac¬ 
teristic roots a l5 yj. / 

A similar discussion can be carried out to classify quadric surfaces in 
3-space, and, indeed quadric surfaces in n-space. What essentially deter¬ 
mines the geometric nature of the quadric surface associated with 

a n*i 2 + * ' ' + <x„„x„ 2 + 2 ^2 ct-ijXiXj 

i<j 
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is the size and sign of the characteristic roots of the matrix (a tj ). If we 
were not interested in the relative flatness of the quadric surface (e.g., if we 
consider an ellipse as a flattened circle), then we could ignore the size of the 
nonzero characteristic roots and the determining factor for the shape of the 
quadric surface would be the number of 0 characteristic roots and the num¬ 
ber of positive (and negative) ones. 

These things motivate, and at the same time will be clarified in, the 
discussion that follows, which culminates in Sylvester’s law of inertia. 

Let i be a real symmetric matrix and let us consider its associated 
quadratic form Q{y) — (vA, v). If T is any nonsingular real linear trans¬ 
formation, given v 6 F {n) , v = wT for some w e F (n) , whence (vA, v) = 
(wTA, wT) = (wTAT', w ). Thus A and TAT' effectively define the same 
quadratic form. This prompts the 

DEFINITION Two real symmetric matrices A and B are congruent if 
there is a nonsingular real matrix T such that B = TAT'. 

LEMMA 6.11.1 Congruence is an equivalence relation. 

Proof. Let us write, when A is congruent to B, A = B. 

1. A £ A for A = IAV. 

2. If A = B then B = TAT' where T is nonsingular, hence A = SBS' 
where S = T~ x . Thus B ^ A. 

3. If A ^ B and B £ C then B = TAT' while C = RBR', hence C = 
RTAT'R' = (RT)A(RT)', and so A s C. 

Since the relation satisfies the defining conditions for an equivalence 
relation, the lemma is proved. 

The principal theorem concerning congruence is its characterization, 
contained in Sylvester’s law. 

THEOREM 6.11.1 Given the real symmetric matrix A there is an invertible 
matrix T such that 

TAr - C ,) 

where I r and, I s are respectively the r x r and s x s unit matrices and where 0 t 
is the t x t zero-matrix. The integers r 4- s, which is the rank of A, and r — s, 
which is the signature of A, characterize the congruence class of A. That is, two real 
symmetric matrices are congruent if and only if they have the same rank and signature. 

Proof. Since A is real symmetric its characteristic roots are all real; let 
X x ,.. ., X r be its positive characteristic roots, — A r+1 , ..., — A r+S its 



Sec. 6.11 Real Quadratic Forms 


353 


negative ones. By the discussion at the end of Section 6.10 we can find a 
real orthogonal matrix C such that 


CAC ~ 1 = CAC' = 


K 


-l 


r + 1 


■K 




0, 


where t = n — r — 

6 . 11 . 1 . 


s ' Let D be the re al diagonal matrix shown in Figure 



D = 


1 

V K 


V K+l 


V K 


Figure 6.11.1 

A simple computation shows that 

/4 

DCAC'D' = I -/ 




Thus there is a matrix of the required form in the congruence class of A. 

ur task is now to show that this is the only matrix in the congruence 
class of A of this form, or, equivalently, that 


(I., 


L = 


-I s and M = I -/ 

^ °J \ 0 J 

are congruent only if r = r', s = and t = t’. 

Suppose that M = TLT' where T is invertible. By Lemma 6 1 3 the 
rank of M equals that of L; since the rank of M is n - t' while that of L 
n — t we get t = t'. 

Suppose that r < r'; since n = r + r + t = r' + / + f, an d since 
i = i, we must have r > s'. Let U be the subspace of F<"> of all vectors 
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having the first r and last t coordinates 0; U is ^-dimensional and for u + 0 
in U, ( uL, u) < 0. 

Let W be the subspace of F (n) for which the r' + 1,. .., r + s com¬ 
ponents are all 0; on W, ( wM, w) > 0 for any w £ W. Since T is invertible, 
and since W is (n - *')-dimensional, WT is (n - -O-dimensional. For 
w £ W, ( wM, w) > 0; hence ( wTLT', w) > 0; that is, ( wTL , wT) > 0. 
Therefore, on WT, ( wTL , wT) > 0 for all elements. Now dim {WT) + 
dim U = (n - s') + r = n + s - s' > n; thus by the corollary to Lemma 
4.2.6, WT n U ^ 0. This, however, is nonsense, for if x ^ 0 e WT n U, 
on one hand, being in U, {xL, x) < 0, while on the other, being in WT, 
{xL, x) > 0. Thus r — r' and so s = s'. 

The rank, r + 5 , and signature, r - 5, of course, determine r, s and so 
t = (u - r - s), whence they determine the congruence class. 


Problems 

1. Determine the rank and signature of the following real quadratic forms: 

(a) x t 2 + 2 x ^2 + x i 2 ' 

(b) x± 2 + x t x 2 + 2^*3 + 2*2 2 + 4 * 2*3 d- 2*3 . 

2. If A is a symmetric matrix with complex entries, prove we can find a 

complex invertible matrix B such that BAB' = ^ r ^ ^ and that r, 


\ V 

the rank of A, determines the congruence class of A relative to complex 


congruence. 

3. If F is a field of characteristic different from 2, given A e F n , prove that 
there exists a Be F n such that BAB is diagonal. 

4. Prove the result of Problem 3 is false if the characteristic of F is 2. 

5. How many congruence classes are there ofn X n real symmetric matrices. 


Supplementary Reading 

Halmos, Paul R., Finite-Dimensional Vector Spaces, 2nd ed. Princeton, N.J.: D. Van 
Nostrand Company, 1958. 
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7 

Selected Topics 

In this final chapter we have set ourselves two objectives. Our first 
is to present some mathematical results which cut deeper than most 
of the material up to now, results which are more sophisticated, and 
are a little apart from the general development which we have followed. 
Our second goal is to pick results of this kind whose discussion, in 
addition, makes vital use of a large cross section of the ideas and 
theorems expounded earlier in the book. To this end we have decided 
on three items to serve as the focal points of this chapter. 

The first of these is a celebrated theorem proved by WedderburrUn 
1905 (“A Theorem on Finite Algebras,” Transactions of the American 
Mathematical Society, Vol. 6 (1905), pages 349-352) which asserts that 
a division ring which has only a finite number of elements must be a 
commutative field. We shall give two proofs of this theorem, differing 
totally from each other. The first one will closely follow Wedderburn’s 
original proof and will use a counting argument; it will lean heavily 
on results we developed in the chapter on group theory. The second 
one will use a mixture of group-theoretic and field-theoretic arguments, 
and will draw incisively on the material we developed in both these 
directions. The second proof has the distinct advantage that in the 
course of executing the proof certain side-results will fall out which 
will enable us to proceed to the proof, in the division ring case, of a 
beautiful theorem due to Jacobson (“Structure Theory for Algebraic 
Algebras of Bounded Degree,” Annals of Mathematics, Vol. 46 (1945), 
pages 695-707) which is a far-reaching generalization of Wedderburn’s 
theorem. 
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Our second high spot is a theorem due to Frobenius (“Uber lineare 
Substitutionen und bilineare Formen ” Journal fur die Reine und Angewandte 
Mathematik, Vol. 84 (1877), especially pages 59-63) which states thatt e 
only division rings algebraic over the field of all real numbers are the field 
of real numbers, the field of complex numbers, and the division ring of real 
quaternions. The theorem points out a unique role for the quaternions, and 
makes it somewhat amazing that Hamilton should have discovered them 
in his somewhat ad hoc manner. Our proof of the Frobenius theorem, now 
quite elementary, is a variation of an approach laid out by Dickson and 
Albert; it will involve the theory of polynomials and fields. 

Our third goal is the theorem that every positive integer can be represented 
as the sum of four squares. This famous result apparently was first con¬ 
jectured by the early Greek mathematician Diophantos. Fermat grappled 
unsuccessfully with it and sadly announced his failure to solve it (in a paper 
where he did, however, solve the two-square theorem which we proved in 
Section 3.8). Euler made substantial inroads on the problem; basing his 
work on that of Euler, Lagrange in 1770 finally gave the first complete proof 
Our approach will be entirely different from that of Lagrange. It is rooted 
in the work of Adolf Hurwitz and will involve a generalization of Euclidean 
rings. Using our ring-theoretic techniques on a certain ring of quaternions, 
the Lagrange theorem will drop out as a consequence. 

En route to establishing these theorems many ideas and results, interesting 
in their own right, will crop up. This is characteristic of a good theorem— 
its proof invariably leads to side results of almost equal interest. 


7.1 Finite Fields 

Before we can enter into a discussion of Wedderburn’s theorem and finite 
division rings, it is essential that we investigate the nature of fields having 
only a finite number of elements. Such fields are called finite fields. Finite 
fields do exist, for the ring J p of integers modulo any prime p, provides us 
with an example of such. In this section we shall determine all possible 
finite fields and many of the important properties which they possess. 

We begin with 


LEMMA 7.1.1 Let F be a finite field with q elements and suppose that F c= K 
where K is also a finite field. Then K has q n elements where n = [X:F], 


Proof. K is a vector space over F and since K is finite it is certainly finite 
dimensional as a vector space over F. Suppose that [K:F] = n\ then 
has a basis of n elements over F. Let such a basis be zq, v 2 , • • • ? <V en 
every element in K has a unique representation in the form a l v 1 + 
a 2 v 2 + ' ' ‘ + a n v n where a l3 a 2 , . . ., a„ are all in F. Thus the num er o 
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elements in K is the number of a x v x + a 2 v 2 + ’ ’ ’ + <x„v„ as the a l5 
a 2 , ■ ■ ., a„ range over F. Since each coefficient can have q values K must 
clearly have q n elements. 

COROLLARY 1 Let F be a finite field; then F has p m elements where the prime 
number p is the characteristic of F. 

Proof. Since F has a finite number of elements, by Corollary 2 to 
Theorem 2.4.1, f\ = 0 where f is the number of elements in F. Thus F 
has characteristic p for some prime number p. Therefore F contains a field 
F 0 isomorphic to J p . Since F 0 has p elements, F has p m elements where 
m — [F:F 0 ], by Lemma 7.1.1. 

COROLLARY 2 If the finite field F has p m elements then every a e F satisfies 
a pm = a. 

Proof. If a — 0 the assertion of the corollary is trivially true. 

On the other hand, the nonzero elements of F form a group under multi¬ 
plication of order p m — 1 thus by Corollary 2 to Theorem 2.4.1, a p ”'~ 1 = 1 
for all a # 0 in F. Multiplying this relation by a we obtain that a pm = a. 

From this last corollary we can easily pass to 

LEMMA 7.1 .2 If the finite field F has p m elements then the polynomial x pm — x 
in F\x\ factors in F\x | as x pm — x — J (x — X). 

Proof. By Lemma 5.3.2 the polynomial x pm — x has at most p pm roots 
in F. However, by Corollary 2 to Lemma 7.1.1 we know p m such roots, 
namely all the elements of F. By the corollary to Lemma 5.3.1 we can 
conclude that x pm — x = (x — A). 

COROLLARY If the field F has p m elements then F is the splitting field of the 
polynomial x pm — x. 

Proof. By Lemma 7.1.2, x pm — x certainly splits in F. However, it 
cannot split in any smaller field for that field would have to have all the 
roots of this polynomial and so would have to have at least p m elements. 
Thus F is the splitting field of x pm — x. 

As we have seen in Chapter 5 (Theorem 5.3.4) any two splitting fields 
over a given field of a given polynomial are isomorphic. In light of the 
corollary to Lemma 7.1.2 we can state 

LEMMA 7.1.3 Any two finite fields having the same number of elements are 
isomorphic. 
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Proof. If these fields have p m elements, by the above corollary they are 
both splitting fields of the polynomial x pm - x, over J p whence they are 
isomorphic. 

Thus for any integer m and any prime number p there is, up to iso¬ 
morphism, at most one field having p m elements. The purpose of the next 
lemma is to demonstrate that for any prime number p and any integer m 
there is a field having p m elements. When this is done we shall know that 
there is exactly one field having p m elements where p is an arbitrary prime 
and m an arbitrary integer. 

LEMMA 7.1.4 For every prime number p and every positive integer m there exists 
a field having p m elements. 

Proof. Consider the polynomial x pm — x in J p [x], the ring of polynomials 
in x over J p , the field of integers mod p. Let K be the splitting field of this 
polynomial. In K let F = {a € K \ a p = a}. The elements of F are thus 
the roots of x p — x , which by Corollary 2 to Lemma 5.5.2 are distinct, 
whence F has p m elements. We now claim that F is a field. If a, b e F 
then a pm = a, b pm = b and so ( ab) pm = a pm b pm = ab ; Jthus ab e F. Also 
since the characteristic is p, (a ± b) p — a p + b p — a + b, hence 
a + b e F. Consequently F is a subfield of K and so is a field. Having 
exhibited the field F having p m elements we have proved Lemma 7.1.4. 

Combining Lemmas 7.1.3 and 7.1.4 we have 

THEOREM 7.1.1 For every prime number p and every positive integer m there 
is a unique field having p m elements. 

We now return to group theory for a moment. The group-theoretic 
result we seek will determine the structure of any finite multiplicative 
subgroup of the group of nonzero elements of any field, and, in particular, 
it will determine the multiplicative structure of any finite field. 

LEMMA 7.1.5 Let G be a finite abelian group enjoying the property that the 
relation x n = e is satisfied by at most n elements of G, for every integer n. Then G 
is a cyclic group. 

Proof. If the order of G is a power of some prime number q then the 
result is very easy. For suppose that a e G is an element whose order is as 
large as possible; its order must be q r for some integer r. The elements 
e, a, a 2 , . . ., a qT ~ 1 give us q r distinct solutions of the equation x q — e, 
which, by our hypothesis, implies that these are all the solutions of this 
equation. Now if b e G its order is (f where s < r, hence b qr = ( b qS ) q = e. 
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By the observation made above this forces b — a 1 for some i, and so G is 
cyclic. 

The general finite abelian group G can be realized as G = S qi S q2 . . ., S qk 
where the q t are the distinct prime divisors of o(G) and where the S are 
the Sylow subgroups of G. Moreover, every element g e G can be written 
in a unique way as g = s±s 2 , . . ., s k where s t e S q . (see Section 2.7). Any 
solution of x n = e in S n . is one of x n = e in G so that each S n inherits the 
hypothesis we have imposed on G. By the remarks of the first paragraph 
of the proof, each S q . is a cyclic group; let a t be a generator of S q .. We 
claim that c — a l a 2 , . . . , a k is a cyclic generator of G. To verify this all 
we must do is prove that o(G) divides m, the order of c. Since c m = e , we 
have that af'af" • • • a k m = e. By the uniqueness of representation of an 
element of G as a product of elements in the S q ., we conclude that each 
a™ = e. Thus o(S qi ) | m for every i. Thus o(G) = o(S qi )o(S q2 ) • • • o(S qk ) \ m. 
However, m \ o(G) and so o(G) = m. This proves that G is cyclic. 

Lemma 7.1.5 has as an important consequence 

LEMMA 7.1.6 Let K be a field and let G be a finite subgroup of the multiplicative 
group of nonzero elements of K. Then G is a cyclic group. 

Proof. Since K is a field, any polynomial of degree n in K[x\ has at most 
n roots in K. Thus in particular, for any integer n, the polynomial x n — 1 
has at most n roots in K, and all the more so, at most n roots in G. The 
hypothesis of Lemma 7.1.5 is satisfied, so G is cyclic. 

Even though the situation of a finite field is merely a special case of 
Lemma 7.1.6, it is of such widespread interest that we single it out as 

THEOREM 7.1.2 The multiplicative group of nonzero elements of a finite field 
is cyclic. 

Proof. Let F be a finite field. By merely applying Lemma 7.1.6 with 
F = K and G = the group of nonzero elements of F, the result drops out. 

We conclude this section by using a counting argument to prove the 
existence of solutions of certain equations in a finite field. We shall need 
the result in one proof of the Wedderburn theorem. 

LEMMA 7.1.7 If F is a finite field and cc ^ 0, ^ 0 are two elements of F 

then we can find elements a and b in F such that 1 + <xa 2 + fib 2 = 0. 

Proof. If the characteristic of F is 2, F has 2" elements and every 
element x in F satisfies x 2 " = x. Thus every element in F is a square. In 
particular a -1 = a 2 for some a e F. Using this a and b — 0, we have 






Selected Topics Ch. 7 


\+aa 2 + pb 2 = l+aa~ 1 +0=l + \=0, the last equality being a 
consequence of the fact that the characteristic of F is 2. 

If the characteristic of F is an odd prime p, F has p n elements. Let 
W a = {1 + ax 2 |xg.F}. How many elements are there in W a ? We 
must check how often 1 + ax 2 = 1 + ay 2 . But this relation forces ax 2 = 

ay 2 and so, since a ^ 0, x 2 = y 2 - Finally this leads to x = + y. Thus for 

x ^ 0 we get from each pair x and — x one element in W a , and for x = 0 
we get 1 G W a . Thus W a has 1 +(/>"- l)/2 = {p n + l)/2 elements. 
Similarly W fi = {~Px 2 \xeF} has (p n + l)/2 elements. Since each of 
W a and W p has more than half the elements of F they must have a non¬ 
empty intersection. Let c G W a n Wp. Since c G W a , c = 1 + ot a for 

some a G F; since c G Wp, c = —fib 2 for some b G F. Therefore 1 + aa = 

— fib 2 , which, on transposing yields the desired result 1 + a a 2 + fib 2 =0. 

Problems 

1. By Theorem 7.1.2 the nonzero elements of J p form a cyclic group under 
multiplication. Any generator of this group is called a primitive root of p. 

(a) Find primitive roots of: 17, 23, 31. 

(b) How many primitive roots does a prime p have? 

2. Using Theorem 7.1.2 prove that x 2 = - 1 mod p is solvable if and only 
if the odd prime p is of the form 4n + 1. 

3. If a is an integer not divisible by the odd prime p, prove that x 2 = a 
mod p is solvable for some integer x if and only if 1)/2 = 1 mod/>. 
(This is called the Euler criterion that a be a quadratic residue mod p.) 

4. Using the result of Problem 3 determine if: 

(a) 3 is a square mod 17. 

(b) 10 is a square mod 13. 

5. If the field F has p n elements prove that the automorphisms of F form 

a cyclic group of order n. 

6. If F is a finite field, by the quaternions over F we shall mean the set of 
all a o + a t i + a 2 j + a 3 & where a 0 , a l5 a 2 , a 3 eF and where addition 
and multiplication are carried out as in the real quaternions (i.e., 
i 2 = j 2 = k 2 = ijk = —1, etc.). Prove that the quaternions over a 
finite field do not form a division ring. 

7.2 Wedderburn's Theorem oit Finite Division Rings 

In 1905 Wedderburn proved the theorem, now considered a classic, that a 
finite division ring must be a commutative field. This result has caught the 
imagination of most mathematicians because it is so unexpected, interrelating 
two seemingly unrelated things, namely the number of elements in a certain 
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algebraic system and the multiplication of that system. Aside from its 
intrinsic beauty the result has been very important and useful since it arises 
in so many contexts. To cite just one instance, the only known proof of the 
purely geometric fact that in a finite geometry the Desargues configuration 
implies that of Pappus (for the definition of these terms look in any good 
book on projective geometry) is to reduce the geometrie problem to an 
algebraic one, and this algebraic question is then answered by invoking the 
Wedderburn theorem. For algebraists the Wedderburn theorem has served 
as a jumping-off point for a large area of research, in the 1940s and 1950s, 
concerned with the commutativity of rings. 


THEOREM 7.2.1 (Wedderburn) A finite division ring is necessarily a 
commutative field. 


First Proof. Let K be a finite division ring and let Z = {z e K \ zx = xz 
for all x e K) be its center. If Z has q elements then, as in the proof of 
Lemma 7.1.1, it follows that K has q n elements. Our aim is to prove that 
Z = K, or, equivalently, that n = 1. 

If a e K let N(a) = {x e K \ xa = ax). N(a) clearly contains Z, and, 
as a simple check reveals, N(a) is a subdivision ring of K. Thus N(a) 
contains ^" (a) elements for some integer n(a). We claim that n{a) \ n. For, 
the nonzero elements of N(a) form a subgroup of order q"^ — 1 of the 
group of nonzero elements, under multiplication, of K which has q" — 1 
elements. By Lagrange’s theorem (Theorem 2.4.1) q n(a) — 1 is a divisor 
of q n — 1; but this forces n{a) to be a divisor of n (see Problem 1 at the end 
of this section). 

In the group of nonzero elements of K we have the conjugacy relation 
used in Chapter 2, namely a is a conjugate of b if a = x~ x bx for some 
x 0 in K. 

By Theorem 2.11.1 the number of elements in K conjugate to a is the 
index of the normalizer of a in the group of nonzero elements of K. Therefore 
the number of conjugates of a in K is (q n — 1 )l(q n ^ — 1). Now aeZ if 
and only if n[a ) = n, thus by the class equation (see the corollary to 
Theorem 2.11.1) 


q n - 1=?-1 + 


V qn ~ 1 

.i&r. <f ia> -1 

n(a)^n 


( 1 ) 


where the sum is carried out over one a in each conjugate class for cl s not 
in the center. 

The problem has been reduced to proving that no equation such as (1) 
can hold in the integers. Up to this point we have followed the proof in 
Wedderburn’s original paper quite closely. He went on to rule out the 
possibility of equation (1) by making use of the following number-theoretic 
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result due to BirkhofFand Vandiver: for n > 1 there exists a prime number 
which is a divisor of q n — 1 but is not a divisor of any q m - 1 where m is a 
proper divisor of n, with the exceptions of 2 6 — 1 = 63 whose prime factors 
already occur as divisors of 2 2 - 1 and 2 3 - 1, and n = 2, and q a prime 
of the form 2 k — 1. If we grant this result, how would we finish the proof? 
This prime number would be a divisor of the left-hand side of (1) and also 
a divisor of each term in the sum occurring on the right-hand side since it 
divides q" — 1 but not q n ^ a) — 1; thus this prime would then divide q — 1 
giving us a contradiction. The case 2 6 — 1 still would need ruling out but 
that is simple. In case n = 2, the other possibility not covered by the 
above argument, there can be no subfield between Z and K and this forces 
Z — K. (Prove!—See Problem 2.) 

However, we do not want to invoke the result of BirkhofF and Vandiver 
without proving it, and its proof would be too large a digression here. So 
we look for another artifice. Our aim is to find an integer which divides 
(?" ~ l)/(<7" (a) ~ 1) 5 for all divisors n(a) of n except n[a ) = n, but does 
not divide q - 1. Once this is done, equation (1) will be impossible unless 
n = 1 and, therefore, Wedderburn’s theorem will have been proved. The 
means to this end is the theory of cyclotomic polynomials. (These have 
been mentioned in the problems at the end of Section 5.6.) 

Consider the polynomial x n — 1 considered as an element of C[*] where 
C is the field of complex numbers. In C[x\ 

x " - 1 = n o* - ( 2 ) 

where this product is taken over all X satisfying X n = 1. 

A complex number 6 is said to be a primitive nth root of unity if 6” = 1 
but 6 m =X 1 for any positive integer m < n. The complex numbers satis¬ 
fying x n = 1 form a finite subgroup, under multiplication, of the complex 
numbers, so by Theorem 7.1.2 this group is cyclic. Any cyclic generator of 
this group must then be a primitive nih. root of unity, so we know that such 
primitive roots exist. (Alternatively, 6 = e 2ni ^ n yields us a primitive nth 
root of unity.) 

Let <D„(x) = II (•* — 6) where this product is taken over all the primitive 
?zth roots of unity. This polynomial is called a cyclotomic polynomial. We 
list the first few cyclotomic polynomials: $ x (x) = x — 1, <D 2 (x) = x + 1, 
<D 3 (x) = x~ + x + 1, <D 4 (x) = x 2 + 1, <D 5 (x) = x 4 + x 3 + x 2 + x + 1, 
= x ~ x + 1- Notice that these are all monic polynomials with 
integer coefficients. v 

Our first aim is to prove that in general <l>„(x) is a monic polynomial with 
integer coefficients. We regroup the factored form of x n — 1 as given in (2), 
and obtain 

-1 = n 

din 


X 


(3) 
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By induction we assume that <3> d (x) is a monic polynomial with integer 
coefficients for d\n, d n. Thus x" — 1 = O^x)^*) where g(x) is a 
monic polynomial with integer coefficients. Therefore, 


which, on actual division (or by comparing coefficients), tells us that 
is a monic polynomial with integer coefficients. 

We now claim that for any divisor d of n, where d ^ n. 


O n (x) 


x n - 1 
x d - 1 


in the sense that the quotient is a polynomial with integer coefficients. To 
see this, first note that 

-1 = n 

k\d 

and since every divisor of d is also a divisor of n, by regouping terms on 
the right-hand side of (3) we obtain x d - 1 on the right-hand side; also 
since d < n, x d — 1 does not involve <D„(,*). Therefore, x n — 1 = 
O n (x)(x d — l)/(x) where 

/(*) = II ®*(*) 

kin 

, . k/d 

has integer coefficients, and so 


in the sense that the quotient is a polynomial with integer coefficients. 
This establishes our claim. 

For any integer t, <D n (t) is an integer and from the above as an integer 
divides (t n — l)l(t d — 1). In particular, returning to equation (1), 




q n - i 

? n(a) - i 


and Q> n (q) \ {(f — 1); thus by (1), | {q — 1). We claim, however, 

that if n > 1 then |O n (^)| > q — 1. For <D n (q) = \\{q — 0) where 9 runs 
over all primitive nth roots of unity and \q — 9\ > q — 1 for all 9 ^ l 
a root of unity (Prove!) whence |O n (^)| = — 9\ > q — 1. Clearly, 

then O n (q) cannot divide q — 1, leading us to a contradiction. We must, 
therefore, assume that n = 1, forcing the truth of the Wedderburn theorem. 

Second Proof. Before explicitly examining finite division rings again, 
we prove some preliminary lemmas. 
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LEMMA 7.2.1 Let R be a ring and let a e R. Let T a be the mapping of R 
into itself defined by xT a — xa — ax. Then 

xT m = xa m — maxa m ~ 1 + -11 a 2 xa m ~ 2 

2 

- mt - m - ~ 2 > q'xa-- 3 + •••■ 

3! 

Proof. What is xT a 2 ? xT a 2 = ( xT a )T a = (*a — ax)T a = {xa — ax)a — 
a{xa — ax) — xa 2 — 2a*a + a 2 *. What about xT a 3 ? ^7" a 3 = (xT a 2 )T a = 
(;ea 2 — 2a;ea + a 2 x)a — a(xa 2 — 2 axa + a 2 x) = xa 3 — 3axa 2 + 3 a 2 xa — a 3 x. 
Continuing in this way, or by the use of induction, we get the result of 
Lemma 7.2.1. 

COROLLARY If R is a ring in which px — 0 for all x e R, where p is a prime 
number, then xT a pm = xa pm — a pm x. 

Proof. By the formula of Lemma 7.2.1, if p = 2, xT a 2 = xa 2 — a 2 x, 
since 2 axa = 0. Thus, *7"' a 4 = {xa 2 — a 2 x)a 2 — a 2 {xa 2 — a 2 x) = xa 4 — 
a 4 x, and so on for xT a 2m . 

If p is an odd prime, again by the formula of Lemma 7.2.1, 

p{p - 1) 

xTf = xa p — paxa p 1 + — - a 2 xa p 2 + • • • — a p x, 

and since 

p{p - l)...{p - i + 1) 

p T\ 

for i < p, all the middle terms drop out and we are left with xT a p = 
xa p — a p x = xT aP . Now xT a pl — x{T aP ) p = xT aP i, and so on for the 
hig;her powers of p. 

LEMMA 7.2.2 Let D be a division ring of characteristic p > 0 with center Z, 
and let P = {0, 1, 2, — l)} be the subfield of Z isomorphic to J p . Suppose 

that a e D, a £ Z is such that a p " = a for some n > 1. Then there exists an 
x e D such that 

1. xax ~ 1 ^ a. 

2. xax~ 1 e P{a) the field obtained by adjoining a to P. 

Proof. Define the mapping T a of D into itself by yT a = ya — ay for 
every y e D. 

P{a) is a finite field, since a is algebraic over P and has, say, p m elements. 
These all satisfy u pm = u. By the corollary to Lemma 7.2.1, yT p ™ = 
ya pm — a pm y = ya — ay = yT a , and so T pm = T a . 
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Now, if X eP{a), {Xx)T a = (Xx)a — a{Xx) = Xxa — Xax = X(xa — ax) 
= X(xT a ), since X commutes with a. Thus the mapping XI of D into itself 
defined by XI :y -> Xy commutes with T a for every XeP(a). Now the 
polynomial 

u p — u = (u — X) 

XSP(a) 

by Lemma 7.2.1. Since T a commutes with XI for every XeP(a), and since 
Tf m = T a , we have that 

o = T a pm - T a = n (r. - U). 

X G P(a) 

If for every X # 0 in P(a), T a — XI annihilates no nonzero element in 
n (ibi T a - XI) = 0 implies y = 0), since T a (T a - XJ) • • • (T a - X k I) = 
0, where X l} . . . , X k are the nonzero elements of P(a), we would get 
T a = 0. That is, 0 = yT a = ya — ay for every y e D forcing a e Z con¬ 
trary to hypothesis. Thus there is a X # 0 in P(a) and an x # 0 in D 
such that x(T a — XI) — 0. Writing this out explicitly, xa — ax — Xx = 0; 
hence, xax~ 1 = a + X is in P(a) and is not equal to a since X # 0. This 
proves the lemma. 

COROLLARY In Lemma 7.2.2, xax~ 1 = a 1 # a for some integer i. 

Proof. Let a be of order s; then in the field P(a) all the roots of the 
polynomial u s — 1 are 1 , a, a 2 , . . . , a s ~ 1 since these are all distinct roots 
and they are s in number. Since [xax~ 1 ) s = xa s x~ 1 = 1, and since 
xax~ 1 e P(a), xax~ 1 is a root in P(a) of u s — 1, hence xax~ 1 = a 1 . 

We now have all the pieces that we need to carry out our second proof of 
Wedderburn’s theorem. 

Let D be a finite division ring and let Z be its center. By induction we 
may assume that any division ring having fewer elements than D is a 
commutative field. 

We first remark that if a, b e D are such that b'a = ab l but ba # ab, 
then b { e Z. For, consider N(b l ) = {x e D \ b l x = xb { }. N(b f ) is a sub¬ 
division ring of D ; if it were not D, by our induction hypothesis, it would 
be commutative. However, both a and b are in Nib 1 ) and these do not 
commute; consequently, N{tf) is not commutative so must be all of D. 
Thus b f e Z. 

Every nonzero element in D has finite order, so some positive power of it 
falls in Z. Given w e D let the order of w relative to Z be the smallest positive 
integer m{w) such that w m ^ w * e Z. Pick an element a in D but not in Z 
having minimal possible order relative to Z, and let this order be r. We 
claim that r is a prime number, for if r = r t r 2 with 1 < r x < r then a ri is not 
in Z. Yet (a ri ) r2 = <f e Z, implying that a ri has an order relative to Z 
smaller than that of a. 
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By the corollary to Lemma 7.2.2 there is an x e D such that xax 1 
a * i=. a; thus x 2 ax~ 2 = x(xax~ 1 )x~ 1 = xa l x~ 1 = (xax 1 )' = («‘)‘ = «' • 
Similarly, we get x f_ 1 ax _(r_ n = « ir_1 . However, r is a prime number, 
thus by the little Fermat theorem (corollary to Theorem 2.4.1), i r 1 = 

1 + u 0 r, hence a * r_1 = a 1+ “° r = aa u ° r = la where l = a u ° r e Z. Thus 
x r ~ l a = lax r ~ 1 . Since x £ Z, by the minimal nature of r, 1 cannot be 
in Z. By the remark of the earlier paragraph, since xa ^ ax, x r 1 a # 
and so l j- 1. Let 6 = x r_1 ; thus 6 a & _1 = la; consequently, l r a r = 

(, bab~ 1 ) r = ba r b~ 1 = a r since a r eZ. This relation forces l r = 1. 

We claim that ii y e D then whenever y r — 1, then y — l l for some i, 
for in the field. Z ( y) there are at most r roots of the polynomial u r — 1; 
the elements 1, l, l 2 , . . . , l r 1 in Z are all distinct since l is of the prime 
order r and they already account for r roots of u r — 1 in Z(y), in con¬ 
sequence of which y = l l . 

Since )! =1, b r = l r b r = ( lb) r = ( a~ 1 ba) r = a~ l b r a from which we 
get ab r = b r a. Since a commutes with b r but does not commute with b, by 
the remark made earlier, b r must be in Z. By Theorem 7.1.2 the multi¬ 
plicative group of nonzero elements of Z is cyclic; let y e Z by a generator. 
Thus a r = y j , b r = y k ; if j = sr then a r = y sr , whence (a/y s ) r = 1; this 
would imply that a/y s = l l , leading to a e Z, contrary to a $ Z. Hence, 
r fij; similarly r k. Let a x = a k and b x = b j ; a direct computation 
from ba = lab leads to a x b x — pib x a x where pi = l jk e Z. Since the prime 
number r which is the order of l does not divide j or k, l jk # 1 hence 
H ^ l. Note that pi r = 1. 

Let us see where we are. We have produced two elements a x , b x such that 

1. a x = b x = a 6 Z. 

2. a x b x = fib x a x with fi # 1 in Z. 

3. pi r = 1. 

We compute (a l ~ 1 b l ) r ; ( a x ~ 1 b x ) 2 = a x 1 b l a l l b x = a x 1 (b l a x 1 )b l = 
a x ~ 1 (fia l ~ 1 b l )b l = pia l ~ 2 b l 2 . If we compute {a x ~ 1 b 1 ) 3 we find it equal to 
fi 1 + 2 a l ~ 3 b l 3 . Continuing, we obtain {a x ~ 1 b l ) r = pi 1 + 2+ +(r ^a x r b x r = 

+ 2 +■ ■ •+(r-i) _ ^r(r-i )/2 jf r j s an odd prime, since // = 1 , we get 
^r(r-i )/2 _ ^ whence (a l ~ 1 b l ) r = 1. Being a solution of y r = 1, 
a x ~ 1 b x = )} so that b x = l‘a x ; but then fib x a x = a x b x = 6 ^, contra¬ 
dicting pi 7 ^ 1. Thus if r is an odd prime number, the theorem is proved. 

We must now rule out the case r = 2. In that special situation we have 
two elements a x , b x e D such that 'a x 2 = b x 2 = a e Z, = nb x a x where 
/ 2 2 = 1 and pi # 1. Thus pi = — 1 and = — b x a x # in conse¬ 

quence, the characteristic of D is not 2. By Lemma 7.1.7 we can find elements 
£, t] e Z such that 1 + £ 2 — ocrj 2 = 0. Consider (a x + £b x + r]a x b x ) ; on 
computing this out we find that ( a x + £b x + Y\a x b x ) 2 = a(l + C 2 — ar l ) — 0- 
Being in a division ring this yields that a x + £,b x + r l a 1^1 = thus 0 # 
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2a! 2 = af ai + C&i + rja + (a t + C^i + = 0. This contra¬ 

diction finishes the proof and Wedderburn’s theorem is established. 

This second proof has some advantages in that we can use parts of it to 
proceed to a remarkable result due to Jacobson, namely, 

THEOREM 7.2.2 (Jacobson) Let D be a division ring such that for every 
ae D there exists a positive integer n{a) > 1, depending on a, such that a n{a) = a . 
Then D is a commutative field. 

Proof. If a # 0 is in D then a n = a and (2a) m = 2a for some integers 
n,m > 1. Let s = (a — 1) (m — 1) + 1; s > 1 and a simple calculation 
shows that a s = a and (2a) s = 2a. But (2a) s = 2V = 2 s a, whence 
2 s a = 2a from which we get (2 s — 2)a = 0. Thus D has characteristic 
p > 0. If P a Z is the field having p elements (isomorphic to J ), since 
a is algebraic over P, P (a) has a finite number of elements, in fact, p h ele¬ 
ments for some integer h. Thus, since aeP(a), a ph = a. Therefore, if 
a £ Z all the conditions of Lemma 7.2.2 are satisfied, hence there exists a 
b e D such that 

bab~ 1 = a p # a. (1) 

By the same argument, b pk = b for some integer k > 1. Let 

( p h P k 1 

W = <xeD\x = ^2 ^2/Pij al b J where p t , ePh 

L i= i j =i J 

W is finite and is closed under addition. By virtue of (1) it is also closed 
under multiplication. (Verify!) Thus W is a finite ring, and being a .sub¬ 
ring of the division ring D, it itself must be a division ring (Problem 3). 
Thus W is a finite division ring; by Wedderburn’s theorem it is commutative. 
But a and b are both in W; therefore, ab = ba contrary to a?b = ba. This 
proves the theorem. 

Jacobson’s theorem actually holds for any ring R satisfying a” (a) = a for 
every a e R, not just for division rings. The transition from the division 
ring case to the general case, while not difficult, involves the axiom of choice, 
and to discuss it would take us too far afield. 

Problems 

1. If t > 1 is an integer and ( t m — 1)|(P — 1), prove that m \ n. 

2. If D is a division ring, prove that its dimension (as a vector space) 
over its center cannot be 2. 

3. Show that any finite subring of a division ring is a division ring. 
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4 . (a) Let D be a division ring of characteristic p # 0 and let G be a 
finite subgroup of the group of nonzero elements of D under 
multiplication. Prove that G is abelian. {Hint: consider the sub¬ 
set {x e D | x = Aj e P, g{€ G}.) 

(b) In part (a) prove that G is actually cyclic. 

*5. (a) If R is a finite ring in which x n = x, for all x E R where n > 1 
prove that R is commutative. 

(b) If R is a finite ring in which x 2 = 0 implies that x = 0, prove 
that R is commutative. 

* 6 . Let D be a division ring and suppose that a e D only has a finite 
number of conjugates (i.e., only a finite number of distinct x ax). 
Prove that a has only one conjugate and must be in the center of D. 

7. Use the result of Problem 6 to prove that if a polynomial of degree n 
having coefficients in the center of a division ring has n + 1 roots in the 
division ring then it has an infinite number of roots in that division ring. 

* 8 . Let D be a division ring and K a subdivision ring of D such that 
xKx~ 1 c K for every x # 0 in D. Prove that either K cz Z, the center 
of D or K = D. (This result is known as the Brauer-Cartan-Hua theorem .) 

*9. Let D be a division ring and K a subdivision ring of D. Suppose that 
the group of nonzero elements of K is a subgroup of finite index in the 
group (under multiplication) of nonzero elements of D. Prove that 
either D is finite or K = D. 

10. If 9 # 1 is a root of unity and if q is a positive integer, prove that 
\q - 0| > q - L 

7.3 A Theorem of Frobenius 

In 1877 Frobenius classified all division rings having the field of real numbers 
in their center and satisfying, in addition, one other condition to be described 
below. The aim of this section is to present this result of Frobenius. 

In Chapter 6 we brought attention to two important facts about the 
field of complex numbers. We recall them here: 

PACT 1 Every polynomial of degree n over the field of complex numbers 
has all its n roots in the field of complex numbers. 

\ 

FACT 2 The only irreducible polynomials over the field of real numbers 
are of degree 1 or 2 . 

DEFINITION A division algebra D is said to be algebraic over afield. F if 

1 . F is contained in the center of D; 

2. every a E D satisfies a nontrivial polynomial with coefficients in F. 
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If D, as a vector space, is finite-dimensional over the field F which is 
contained in its center, it can easily be shown that D is algebraic over F (see 
Problem 1, end of this section). However, it can happen that D is algebraic 
over F yet is not finite-dimensional over F. 

We start our investigation of division rings algebraic over the real field 
by first finding those algebraic over the complex field. 

LEMMA 7.3.1 Let C be the field of complex numbers and suppose that the division 
ring D is algebraic over C. Then D = C. 

Proof. Suppose that a e D. Since D is algebraic over C, a n + 
ai x a n ~ 1 + • • • + a „- x a + a„ = 0 for some ot x , ot 2 , • • •, a„ in C. 

Now the polynomial p(x) = x n + ot 1 x n 1 + ■ ■ * + <x„- x x + a„ in C[#], 
by Fact 1, can be factored, in C\x \, into a product of linear factors; that is, 
p{x) = (x — X x )(x — X 2 ) ‘ * * (* — X n ), where X x , X 2 , . . ., X„ are all in C. 
Since C is in the center of D, every element of C commutes with a, hence 
p{a) = {a — X x ){a — X 2 ) • • • {a — X n ). But, by assumption, p{a) = 0, 
thus {a — X x )(a — X 2 ) ■ • • (a — X n ) = 0. Since a product in a division 
ring is zero only if one of the terms of the product is zero, we conclude that 
a — X k = 0 for some k, hence a = X k , from which we get that a e C. 
Therefore, every element of D is in C; since C cz D, we obtain D — C. 

We are now in a position to prove the classic result of Frobenius, namely, 

THEOREM 7.3.1 (Frobenius) Let D be a division ring algebraic over F, 
the field of real numbers. Then D is isomorphic to one of: the field of real numbers, 
the field of complex numbers, or the division ring of real quaternions. 

Proof. The proof consists of three parts. In the first, and easiest, we 
dispose of the commutative case; in the second, assuming that D is not 
commutative, we construct a replica of the real quaternions in D ; in the 
third part we show that this replica of the quaternions fills out all of D. 

Suppose that D ^ F and that a is in D but not in F. By our assumptions, 
a satisfies some polynomial over F, hence some irreducible polynomial over 
F. In consequence of Fact 2, a satisfies either a linear or quadratic equation 
over F. If this equation is linear, a must be in F contrary to assumption. 
So we may suppose that a 2 — 2a a + /? = 0 where a, (i e F. Thus 
(a - a ) 2 = a 2 - ft; we claim that a 2 - /? < 0 for, otherwise, it would 
have a real square root <5 and we would have a — a = + <5 and so a would 
be in F. Since a 2 — /? < 0 it can be written as — y 2 where y e F. Con¬ 
sequently (a — a ) 2 = — y 2 , whence [(a — a)/y ] 2 = — 1. Thus if a e D, 
a $ F we can find real a, y such that [(a — a)/y] 2 = — 1. 

If D is commutative, pick a e D, a$F and let i = ( a — a) /y where a, y 
in F are chosen so as to make i 2 = — 1. Therefore D contains F(i), a field 
isomorphic to the field of complex numbers. Since D is commutative and 
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algebraic over F it is, all the more so, algebraic over F(i). By Lemma 7.3.1 
we conclude that D = F(i). Thus if D is commutative it is either F or F(i). 

Assume, then, that D is not commutative. We claim that the center of D 
must be exactly F. If not, there is an a in the center, a not in F. But then 
for some a, y e F, [( a — a)/y] 2 = —1 so that the center contains a field 
isomorphic to the complex numbers. However, by Lemma 7.3.1 if the 
complex numbers (or an isomorph of them) were in the center of D then 
D = C forcing D to be commutative. Hence F is the center of D. 

Let aeD, a£F; for some a , y e F, i = (a — a)/y satisfies i 2 = — 1. 
Since i $ F, i is not in the center of F. Therefore there is an element b eD 
such that c = bi — ib ± 0. We compute ic + ci; ic + ci = i(bi — ib) + 

(bi — ib)i = ibi — i 2 b + bi 2 — ibi = 0 since i 2 = —1. Thus ic = —ci; 

from this we get ic 2 = — c(ic ) = — c( — ci) = c 2 i, and so c 2 commutes 
with i. Now c satisfies some quadratic equation over F, c 2 + Xc + fi = 0. 
Since c 2 and pi commute with i, Xc must commute with i; that is, Xci = 
iXc = Xic = — Xci, hence 2 Xci = 0, and since 2 ci ^ 0 we have that X = 0. 

Thus c 2 = —pi; since c $ F (for ci = —ic ^ ic) we can say, as we have 

before, that pi is positive and so pi — v 2 where v e F. Therefore c 2 = — v 2 ; 
let j = cfv. Thenj satisfies 



„ .. .. c . . c ci + ic _ 

2 . ji + ij — - i + i - — - = 0. 

v v v 


Let k = ij. The i,j, k we have constructed behave like those for the qua¬ 
ternions, whence T = (a 0 + a t i + a 2 j + | a 0 , a l3 a 2 , oc 3 eF} forms a 

subdivision ring ofD isomorphic to the real quaternions. We have produced 
a replica, T, of the division ring of real quaternions in D ! 

Our last objective is to demonstrate that T — D. 

If r e D satisfies r 2 = — 1 let N(r) = {x e D \ xr = rx}. N(r) is a sub¬ 
division ring of D; moreover r, and so all a 0 + oqr, a 0 , oq e F, are in the 
center of N(r). By Lemma 7.3.1 it follows that N(r) = (a 0 + oqr | a 0 , 
oq e F}. Thus if xr = rx then x = a 0 + oqr for some a 0 , oq in F. 

Suppose that ueD , u£F. For some a, jS e F, w = (u — a)/jS satisfies 
w 2 = —1. We claim that wi + iw commutes with both i and w; for 
i(wi + iw) = iwi + i 2 w = iwi + wi 2 = ( iw + wi)i since i 2 = — 1. 
Similarly w(wi + iw) = (wi + iw)w. By the remark of the preceding 
paragraph, wi + iw = a' 0 + a[i = <3t 0 + oqzf. If w £ T this last relation 
forces oq = 0 (for otherwise we could solve for w in terms of i). Thus 
wi + iw = a 0 e F. Similarly wj + jw — fi 0 e F and wk + kw = y 0 e F. 
Let 

= w + 2a ,• + a°j + is. k . 

2 2 2 


z 
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Then 



= a 0 - a 0 = 0; 

similarly zj + jz = 0 and zk + kz — 0. We claim these relations force z 
to be 0. For 0 = zk + kz = zij + ijz = (zi + iz) j + i(jz — zj) = 
~ z 3) since zi + iz = 0. However i ^ 0, and since we are in a 
division ring, it follows that jz - zj = 0. But jz + zj = 0. Thus 2 jz = 0, 
and since 2 j # 0 we have that z = 0. Going back to the expression for 
Z we get 

w + 22; + & j + 22 k = 0, 

2 2 2 

hence w e T, contradicting w $ T. Thus, indeed, w e T. Since w = 
{u - a) 1/3, u = pw + at and so u e T. We have proved that any element 
in D is in T. Since T cz D we conclude that D = T; because T is iso¬ 
morphic to the real quaternions we now get that D is isomorphic to the 
division ring of real quaternions. This, however, is just the statement of 
the theorem. 

Problems 

1. If the division ring D is finite-dimensional, as a vector space, over the 
field F contained in the center of D, prove that D is algebraic over F. 

2. Give an example of a field K algebraic over another field F but not 
finite-dimensional over F. 

3. If A is a ring algebraic over a field F and A has no zero divisors prove 
that A is a division ring. 

7.4 Integral Quaternions and the Four-Square Theorem 

In Chapter 3 we considered a certain special class of integral domains 
called Euclidean rings. When the results about this class of rings were 
applied to the ring of Gaussian integers, we obtained, as a consequence, 
the famous result of Fermat that every prime number of the form 4n + 1 
is the sum of two squares. 

We shall now consider a particular subring of the quaternions which, in 
all ways except for its lack of commutativity, will look like a Euclidean ring. 
Because of this it will be possible to explicitly characterize all its left-ideals. 
This characterization of the left-ideals will lead us quickly to a proof of the 
classic theorem of Lagrange that every positive integer is a sum of four 
squares. 
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Let Q be the division ring of real quaternions. In Q we now proceed to 
introduce an adjoint operation, *, by making the 

DEFINITION For x = a 0 + a t i + a 2 j + a 3 k in Q the adjoint of *, de¬ 
noted by **, is defined by x* = a 0 — a x i — a 2 J — OO¬ 
LEMMA 7.4.1 The adjoint in Q satisjies 
1 ^ * 

2. ( 8x + yy)* = 5x* + yy*; 

3. (xy)* =y*x*; 

for all x,y in Q and all real 8 and y. 

Proof. If x = a 0 + ayi + a 2 j + ct 3 k then x* = a 0 — a x i — a 2 j — ct 3 k, 
whence x** = (a:*)* = a 0 + a x i + a 2 j + a 3 k, proving part 1. 

Let x = a 0 + a x i + a 2 j + a 3 k and y = j8 0 + Pf + jS 2 j + ft 3 k be in Q 
and let 8 and y be arbitrary real numbers. Thus Sx + yy = (8a 0 + yft 0 ) + 
(day + ypy)i + (da 2 + yp 2 )j + (Sa 3 +~yp 2 )k; therefore by the definition 
of the *, (8x + yy)* = (8a 0 + yp 0 ) — (8a t + yfiy)i — (8a 2 + yp 2 )j — 
(8a 3 + yp 3 )k = 8(a 0 - a x i - a 2 j - a 3 k) + y(p 0 - fiyi - p 2 j - p 3 k) = 
8x* + yy*. This, of course, proves part 2. 

In light of part 2, to prove 3 it is enough to do so for a basis of Q over 
the reals. We prove it for the particular basis 1, i,j, k. Now ij = k, hence 
(ij) * — k* — —k = ji = (~j)( — i) = j*i *• Similarly (ik)* = A;**'*, 
(jk)* = k*j*. Also (t 2 )* = (—1)* = —1 = (z *) 2 5 an d similarly for j 
and k. Since part 3 is true for the basis elements and part 2 holds, 3 is true 
for all linear combinations of the basis elements with real coefficients, 
hence 3 holds for arbitrary x andjy in Q. 

DEFINITION If x e Q then the norm of x, denoted by N(x), is defined 
by N(x ) = ***. 

Note that if x = a 0 + a x i + a 2 j + a 3 k then N(x) = *** = (a 0 + ayi + 
a 2 j + ct 3 k)(a 0 — a yi — a 2 j — a 3 k ) = a 0 2 + a x 2 + a 2 2 + a 3 2 ; therefore 
N( 0) = 0 and N(x) is a. positive real number for x 7 ^ 0 in Q. In particular, 
for any real number a, N( a) = a 2 . If a: 7 ^ 0 note that x~ 1 = [1 /N(x)]x*. 

LEMMA 7.4.2 For all x,y e Q, N(xy) = N(x)N(y). 

\ 

Proof. By the very definition of norm, N(xy ) = (xy)(xy)*’, by part 3 
of Lemma 7.4.1, (xy)* = y*x* and so N(xy) = xyy*x*. However, yy* — 
N(y) is a real number, and thereby it is in the center of Q; in particular it 
must commute with x*. Consequently N(xy) = x(yy*)x* = (xx*) (yy*) = 
N(x)N(y). 
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As an immediate consequence of Lemma 7.4.2 we obtain 

LEMMA 7.4.3 (Lagrange Identity) If a 0 , a l3 a 2 , a 3 and Po-> Pi’ Pi, Ps 
are real numbers then (a 0 2 + a x 2 + a 2 2 + a 3 2 )(j5 0 2 + jS x 2 + /? 2 2 + jS 3 2 ) = 

(*0^0 — a iPi ~ & 2 P 2 ~ v-sPs) 2 + (a 0 /?i + oq/?o + a 2 jf? 3 — a 3 jf? 2 ) 2 + 

i^oPi — a i^3 + a 2 Po + «3^i) 2 + (a 0 ^ 3 + a x p 2 — ct 2 Pi + a 3 /? 0 ) 2 . 

Proof. Of course there is one obvious proof of this result, namely, 
multiply everything out and compare terms. 

However, an easier way both to reconstruct the result at will and, at the 
same time, to prove it, is to notice that the left-hand side is N(x)N(y) 
while the right-hand side is N(xy) where x = a 0 + a x i + ot 2 j + a 3 k and 
y = Po + pf + P 2 j + P 3 k. By Lemma 7.4.2, N(x)N(y) = N(xy), ergo 
the Lagrange identity. 

The Lagrange identity says that the sum of four squares times the sum 
of four squares is again, in a very specific way, the sum of four squares. A 
very striking result of Adolf Hurwitz says that if the sum of n squares times 
the sum of n squares is again a sum of n squares, where this last sum has 
terms computed bilinearly from the other two sums, then n = 1, 2, 4, or 8. 
There is, in fact, an identity for the product of sums of eight squares but 
it is too long and cumbersome to write down here. 

Now is the appropriate time to introduce the Hurwitz ring of integral 
quaternions. Let £ = £(1 + i + j + k) and let 

H = (m 0 £ + mf + m 2 j + m 3 k | m Q , m x , m 2 , m 3 integers}. 

LEMMA 7.4.4 H is a subring of Q. If x e H then x* e H and N(x) T is a 
positive integer for every nonzero x in H. 

We leave the proof of Lemma 7.4.4 to the reader. It should offer no 
difficulties. ' 

In some ways H might appear to be a rather contrived ring. Why use the 
quaternions £? Why not merely consider the more natural ring Q 0 = 
{m 0 + m x i + m 2 j + m 3 k | m 0 , m x , m 2 , m 3 are integers}? The answer is that 
Q 0 is not large enough, whereas H is, for the key lemma which follows to 
hold in it. But we want this next lemma to be true in the ring at our disposal 
for it allows us to characterize its left-ideals. This, perhaps, indicates why 
we (or rather Hurwitz) chose to work in H rather than in Q 0 . 

LEMMA 7.4.5 (Left-Division Algorithm) Let a and b be in H with 
MO. Then there exist two elements c and d in H such that a = cb + d and 
N(d) < N(b). 
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Proof. Before proving the lemma, let’s see what it tells us. If we look 
back in the section in Chapter 3 which deals with Euclidean rings, we can 
see that Lemma 7.4.5 assures us that except for its lack of commutativity H 
has all the properties of a Euclidean ring. The fact that elements in H may 
fail to commute will not bother us. True, we must be a little careful not to 
jump to erroneous conclusions; for instance a = cb + d but we have no 
right to assume that a is also equal to be + d, for b and c might not commute. 
But this will not influence any argument that we shall use. 

In order to prove the lemma we first do so for a very special case, namely, 
that one in which a is an arbitrary element of H but b is a positive integer 
n. Suppose that a = * 0 C + *i* T * 2.7 T * 3 ^ where * 0 , t l} t 2 , t 3 are integers and 
that b — n where n is a positive integer. Let c = x 0 ( + x t i + x 2 j + x 3 k 
where x 0 , x u x 2 , x 3 are integers yet to be determined. We want to choose 
them in such a manner as to force N(a — cn) < N(n) = n 2 . But 


-cn = /J 1 + i+J± £j + h i + h j + h k^ 

- nx 0 /- - -J - nx t i - nx 2 j - nx 3 k 


= i(< 0 - nx o) + i(« 0 + 2 t t - n(t 0 + 2x x ))i 

+ Wo + 2*i — n(t 0 + 2 x 2 ))j + Wo + 2*3 — Wo + 2 * 3 ))^ 

If we could choose the integers x 0 , x x , x 2 , x 3 in such a way as to make 
|*o - nx o| < in, \t 0 + 2*! - n(t 0 + 2x x )\ < n, \t 0 + 2 1 2 - n(t 0 + 2x 2 )\ < n 
and |^o + 2*3 - n{t 0 + 2x 3 )\ < n then we would have 

v . \ (* 0 - nx n ) 2 (* 0 + 2 q - n(t 0 + 2x x )) 2 

N ( a cn) = --- + - - + 

< re n 2 + in 2 + in 2 + in 2 < n 2 = N(n), 


which is the desired result. But now we claim this can always be done: 

1. There is an integer x 0 such that t 0 = x 0 n + r where -\n <r <\n\ 
for this x 0 , |* 0 — x 0 n\ = |r| < \n. 

2. There is an integer k such that t 0 + 2t x = kn + r and 0 < r < n. If 

k — t 0 is even, put 2x t — k — t 0 ; then t 0 + 2t l = {2x x + t 0 )n + r 

and |*o + 2*i - {2x t + * 0 )n| = r < n. If, on the other hand, k - t 0 is 

odd, put 2x x = k — *0 + 1 ; thus * 0 + 2 * x = {2x x + t 0 — l)n + r — 

(2x l + * 0 )n + r — n, whence |* 0 + 2 1 1 — (2x t + t 0 )n\ = \r — n\ <, n 

since 0 < r < n. Therefore we can find an integer x t satisfying 

l*o + 2 *i — ( 2 ^i + *o) w l — n. 

3 . As in part 2, we can find integers x 2 and x 3 which satisfy |* 0 + 2 * 2 — 
(2x 2 + * 0 )n| < n and |* 0 + 2* 3 - (2x 3 + t 0 )n\ < n, respectively. 
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In the special case in which a is an arbitrary element of H and b is a 
positive integer we have now shown the lemma to be true. 

We go to the general case wherein a and b are arbitrary elements of H 
and b # 0. By Lemma 7.4.4, n = bb* is a positive integer; thus there exists 
a c e //such that ab* = cn + d x where N(d 1 ) < N(n). Thus N(ab* — cn) < 
N(n); but n = bb * whence we get N(ab* — ebb*) < N(n), and so 
N((a — cb)b*) < N(n) = N(bb*). By Lemma 7.4.2 this reduces to 
N(a — cb)N(b*) < N(b)N(b*); since N(b*) > Owe get N(a — cb) < N{b). 
Putting d = a — cb we have a = cb + d where N(d) < N(b). This 
completely proves the lemma. 

As in the commutative case we are able to deduce from Lemma 7.4.5 

LEMMA 7.4.6 Let L be a left-ideal of H. Then there exists an element ueL 
such that every element in L is a left-multiple of u; in other words , there exists 
u e L such that every x e L is of the form x = ru where r e H. 

Proof. If L = (0) there is nothing to prove, merely put u = 0. 

Therefore we may assume that L has nonzero elements. The norms 
of the nonzero elements are positive integers (Lemma 7.4.4) whence there 
is an element u # 0 in L whose norm is minimal over the nonzero elements 
of L. If a? e L, by Lemma 7.4.5, x = cu + d where N{d) < N(u). However 
d is in L because both x and u, and so cu, are in L which is a left-ideal. 
Thus N(d) =0 and so d = 0. From this x = cu is a consequence. 

Before we can prove the four-square theorem, which is the goal of this 
section, we need one more lemma, namely 

LEMMA 7.4.7 If a e H then a ~ 1 e H if and only if N (a) = 1. 

Proof. If both a and a -1 are in H, then by Lemma 7.4.4 both N(a) 
and N(a~ 1 ) are positive integers. However, aa~ x = 1, hence, by Lemma 
7.4.2, N(a)N(a~ 1 ) = N(aa~ 1 )\= N{ 1) = 1. This forces N(a) = 1. 

On the other hand, if a e H and N(a) = 1, then aa* = N(a) = 1 and 
so a -1 = a*. But, by Lemma 7.4.4, since aeH we have that a* e H, 
and so a -1 = a* is also in H. 

We now have determined enough of the structure of H to use it effectively 
to study properties of the integers. We prove the famous classical theorem 
of Lagrange, 

THEOREM 7.4.1 Every positive integer can be expressed as the sum of squares 
of four integers. 

Proof. Given a positive integer n we claim in the theorem that n = 
Xq 2 + *i 2 + x 2 2 + * 3 2 for four integers * 0 , x x , x 2 , x 3 . Since every integer 
factors into a product of prime numbers, if every prime number were 
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realizable as a sum of four squares, in view of Lagrange’s identity (Lemma 
7.4.3) every integer would be expressible as a sum of four squares. We 
have reduced the problem to consider only prime numbers n. Certainly the 
prime number 2 can be written as 1 2 + 1 2 + 0 2 + 0 2 as a sum of four 
squares. 

Thus, without loss of generality, we may assume that n is an odd prime 
number. As is customary we denote it by p. 

Consider the quaternions W p over J p , the integers mod p-, W p = 
{oc 0 + oqi + tx 2 j + a 3 k | a 0 , a l5 a 2 , a 3 e J p }. W p is a finite ring; moreover, 
since p ^ 2 it is not commutative for ij = —ji # ji. Thus, by Wedder- 
burn’s theorem it cannot be a division ring, hence by Problem 1 at the 
end of Section 3.5, it must have a left-ideal which is neither (0) 
nor W p . 

But then the two-sided ideal V in H defined by V = {x 0 ( + x x i + x 2 j + 
x 3 k | p divides all of x 0 , x Xi x 2 , x 3 } cannot be a maximal left-ideal of H, 
since HfV is isomorphic to W p . (Prove!) (If V were a maximal left-ideal 
in H, H/V, and so W p , would have no left-ideals other than (0) and 
HIV). 

Thus there is a left-ideal L of H satisfying: L # H, L # V, and L 3 V. 
By Lemma 7.4.6, tKere is an element ueL such that every element in L is 
a left-multiple of u. Since p e V, p e L, whence p — cu for some c e H. 
Since u V, c cannot have an inverse in H, otherwise u = c~ x p would be 
in V. Thus N(c) > 1 by Lemma 7.4.7. Since L # H, u cannot have an 
inverse in H, whence N[u) > 1. Since p = cu, p 2 = N(p) = N(cu) = 
N(c)N(u). But N(c ) and N(u) are integers, since both c and u are in H, 
both are larger than 1 and both divide p 2 . The only way this is possible 
is that N{c) = N(u) = p. 

Since ueH, u = m 0 ( + m x i + m 2 j + m 3 k where m 0 , m x , m 2 , m 3 are in¬ 
tegers ; thus 2 u = 2 m 0 £ + 2 m x i + 2 m 2 j + 2 m 3 k = ( m 0 + m 0 i + m 0 j + m 0 k ) + 
2 m x i + 2 m 2 j + 2 m 3 k = m 0 + (2 m x + m 0 )i + (2m 2 + m 0 ) j + (2 m 3 + m 0 )k. 
Therefore N(2u) = m 0 2 + (2 m x + m 0 ) 2 + (2m 2 + m 0 ) 2 + (2 m 3 + m 0 ) 2 . 
But N(2u ) = N(2)N(u) = 4 p since N( 2) = 4 and N(u) = p. We have 
shown that 4 p = m 0 2 + (2 m x + m 0 ) 2 + (2 m 2 + m 0 ) 2 + (2 m 3 + m 0 ) 2 . We 
are almost done. 

To finish the proof we introduce an old trick of Euler’s: If 2 a — x 0 2 + 
x x 2 + x 2 2 + x 3 2 where a, x 0 , x x , x 2 and x 3 are integers, then a — y 0 2 + 
y x 2 + y 2 2 + y 3 2 for some integers y 0 ,y x ,y 2 ,y 3 . To see this note that, since 
2 a is even, the x’s are all even, all ddd or two are even and two are odd. 
At any rate in all three cases we can renumber the x’s and pair them in 
such a way that 


Jo = 


Xq + x x 


Jl = 


Xr\ Xi 


J>2 = 


Xy “ 1 " X 2 


and y 3 = 


x-, — x-. 


2 


2 


2 


2 
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are all integers. But 

Jo 2 + V +y 2 2 +j 3 2 



= K*0 2 + X l 2 + x 2 2 + x 3 2 ) 

= K2 a) 


= a. 

Since 4 p is a sum of four squares, by the remark just made 2 p also is; 
since 2 p is a sum of four squares, p also must be such a sum. Thus p = 
0 O 2 + fl i 2 + 02 2 + 03 2 for some integers a 0 , a t , a 2 , a 3 and Lagrange’s 
theorem is established. 

This theorem itself is the starting point of a large research area in number 
theory, the so-called Waring problem. This asks if every integer can be written 
as a sum of a fixed number of Mi powers. For instance it can be shown 
that every integer is a sum of nine cubes, nineteen fourth powers, etc. 
The Waring problem was shown to have an affirmative answer, in this 
century, by the great mathematician Hilbert. 

Problems 

1. Prove Lemma 7.4.4. 

2. Find all the elements a in Q 0 such that a~ 1 is also in Q 0 . 

3. Prove that there are exactly 24 elements a in H such that a -1 is also 
in H. Determine all of them. 

4. Give an example of an a and b, b f 0, in Q 0 such that it is impossible 
to find c and d in Q 0 satisfying a = cb + d where N(d) < N(b). 

5. Prove that if a e H then there exist integers a, /? such that a 2 + oca + 

P = 0 . 

6 . Prove that there is a positive integer which cannot be written as the 
sum of three squares. 

*7. Exhibit an infinite number of positive integers which cannot be written 
as the sum of three squares. 

Supplementary Reading 

For a deeper discussion of finite fields: Albert, A. A., Fundamental Concepts of Higher 
Algebra. Chicago: University of Chicago Press, 1956. 
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For many proofs of the four-square theorem and a discussion of the Waring problem: 
Hardy, G. H., and Wright, E. M., An Introduction to the Theory of Numbers, 4th ed. 
New York: Oxford University Press, 1960. 

For another proof of the Wedderburn theorem: Artin, E., “Uber einen Satz von 
Herrn J. H. M. Wedderburn,” Abhandlungen, Hamburg Mathematisches Seminar, 
Vol. 5 (1928), pages 245-50. 





Index 


Abel, 237, 252, 256 
Abelian group, 28 

structure of finite, 109, 204 
structure of finitely generated, 203 
Adjoint(s), 318, 321 
Hermitian, 318, 319, 322, 336, 339, 
340 

quaternions, 372 

Adjunction of element to a field, 210 
Albert, 356, 377 
Algebra, 262 

algebraic division, 368 
of all n x n matrices over F, 278, 
279 , 

Boolean, 9, 130 
fundamental theorem of, 337 
linear, 260 

of linear transformations, 261 
Algebraic of degree n, 212, 213 
Algebraic division algebra, 368 
Algebraic element, 209, 210 
Algebraic extension, 213 
Algebraic integer, 215 
Algebraic number(s), 214-216 
Algorithm 
division, 155 
Euclidean, 18 
left-division, 373 
Alpertn, 119 


Alternating group, 80, 256 
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Cancellation laws, 34 
Canonical form(s), 285 
Jordan, 299, 301, 302 
rational, 305, 306, 308 
Cardan’s formulas, 251 
Cartesian product, 5, 6 
Cauchy, 61, 86, 87 
Cauchy’s theorem, 61, 87 
Cayley, 71 

Cayley-Hamilton theorem, 263, 309, 
334, 335 

Cayley’s theorem, 71, 262 
Center of a group, 47, 68 
Centralizer, 47 

Characteristic of integral domain, 129, 
232, 235, 357 

Characteristic polynomial, 309, 332 
Characteristic root(s), 270, 286-289 
multiplicity of, 303 
Characteristic subgroup, 70 
Characteristic vector, 271 
Characteristic zero, 129, 232, 235 
Choice, axiom of, 138, 367 
Class (es) 

congruence, 22, 353, 354 
conjugate, 83, 89, 361 
equivalence, 7 
similarity, 285 
Class equation, 85, 361 
Closure under operation, 27 
Coefficients, 153 
Cofactor, 334 
Column of a matrix, 277 
Combination, linear, 177 
Commutative group, 28 
Commutative law, 23 
Commutative ring(s), 121 
polynomial rings over, 161 
Commutator, 252 

Commutator subgroup(s), 65, 70, 117, 
252, 253 

Companion matrix, 307 
Complement, 5 
Complement, orthogonal, 195 
Complex vector space, 191 
Composition of mappings, 13 
Congruence class, 22, 353, 354 
Congruence modulo a subgroup, 39 
Congruence modulo n, 22 
Congruent, 352 
Conjugacy, 83 
Conjugate, 83 


Conjugate class(es), 83, 89, 361 
Conjugate elements, 83 
Conjugate subgroups, 99 
Constructible, 228, 230 
Constructible number, 228 
Construction, invariant, 187, 188 
Construction with straightedge and 
compass, 228 

Content of polynomial, 159, 163 
Correspondence, one-to-one, 15 
Coset 

double, 49, 97, 98 
left, 47 
right, 40 

Cramer’s rule, 331 
Criterion 

Eisenstein, 160, 240, 249 
Euler, 360 

Cube, duplicating of, 231 
Cycle decomposition, 78 
Cyclic group, 30, 38, 49 
generator of, 48 
Cyclic module, 202 
Cyclic subgroup, 38 
Cyclic subspace, 296, 306 
Cyclotomic polynomial, 250, 362 

De Morgan rules, 8 

Decomposable set of linear transforma¬ 
tions, 291 

Decomposition, cycle, 78 
Definite, positive, 345 
Degree n 

algebraic of, 212, 213 
alternating group of, 80, 256 
of an extension, 208 
general polynomial of, 251 
of polynomial, 154, 162 
symmetric group of, 28, 75, 241, 
253-257, 284 
Der Waerden, Van, 259 
Derivative, 158, 232, 233 
Desargues’ theorem, 361 
Determinant, 322 

of linear transformation, 329 
of matrix, 324 

of system of linear equations, 330 
Diagonal matrix, 282, 305 
Diagonal subset, 6 
Diagonalizable, 305 
Dickson, 356 
Difference module, 202 
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Difference set, 5 
Dihedral group, 54, 81 
Dimension, 181 
Diophantos, 356 
Direct product of groups, 103 
external, 104, 105 
internal, 106 
Direct sum 
external, 175 
internal, 174, 175 
of modules, 202 
Disjoint sets, 4 
mutually, 5 

Distributive law(s), 23, 121 
Divisibility, 144, 145 
Division algebra, algebraic, 368 
Division algorithm for polynomials, 155 
Division ring, 126 
finite, 360 
Di visor (s), 18 

elementary, 308, 309, 310 
greatest common, 18, 145 
Domain 

integral, 126 
unique factorization, 163 
Dot product, 192 
Double coset, 49, 97, 98 
Dual basis, 187 
Dual, second, 188 
Dual space, 184, 187 
Duplicating the cube, 231 


Eigenvalue, 270 

Eisenstein criterion, 160, 240, 249 
Element (s) 

algebraic, 209, 210 
conjugate, 83 
identity, 27, 28 
order of, 43 

order of (in a module), 206 
period of, 43 
prime, 146, 163 
separable, 236 

Elementary divisors of a linear trans¬ 
formation, 308, 309, 310 
Elementary symmetric functions, 242, 
243 

Empty set, 2 
Equality . 

of mappings, 13 
of sets, 2 


Equation (s) 
class, 85, 361 

linear homogeneous, 189, 190 
rank of system of linear, 190 
secular, 332 
Equivalence class, 7 
Equivalence relation, 6 
Euclidean algorithm, 18 
Euclidean rings, 143, 371 
Euler, 43, 356, 376 
Euler criterion, 360 
Euler phi-function, 43, 71, 227, 250 
Even permutation, 78, 79 
Extension 

algebraic, 213 
degree of, 208 
field, 207 
finite, 208-212 
normal, 244-248 
separable, 236, 237 
simple, 235, 236 

External direct product, 104, 105 
External direct sum, 175 


Fermat, 44, 144, 149, 152, 356, 366, 371 
Fermat theorem, 44, 152, 366 
Fermat theorem, little, 44, 366 
Field(s), 126, 127, 207 

adjunction of element to, 210 
automorphism of, 237 
extension, 207 
finite, 122, 356 
perfect, 236 
of quotients, 140 
of rational functions, 162, 241 
of rational functions in n- variables, 241 
splitting, 222-227, 245 
of symmetric rational functions, 241 
Finite abelian group (s), 109 

fundamental theorem of, 109, 204 
invariants of, 111 
Finite characteristic, 129 
Finite dimensional, 178 
Finite extension, 208-212 
Finite field, 122, 356 
Finite group, 28 

Finitely generated abelian group, 202 
Finitely generated modules, 202 
fundamental theorem on, 203 
Fixed field of group of automorphisms, 
238 
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Form(s) 

canonical, 285 

Jordan canonical, 299, 301, 302 
rational canonical, 305-308 
real quadratic, 350 
triangular, 285 
Four-square theorem, 371 
Frobenius, 356, 368, 369 
Frobenius theorem, 369 
Functional, linear, 187, 200 
Functions 

elementary symmetric, 242, 243 
rational, 162, 241 
symmetric rational, 241 
Fundamental theorem 
of algebra, 337 

of finite abelian groups, 109, 204 
of finitely generated modules, 203 
of Galois theory, 247 

Galois, 50, 207 
Galois group, 237 
Galois theory, 237-259 
fundamental theorem of, 247 
Gauss’ lemma, 160, 163, 164 
Gaussian integers, 149 
Gelfond, 216 

General polynomial of degree n, 251 
Generator of cyclic group, 48 
Gram-Schmidt orthogonalization pro¬ 
cess, 196 

Greatest common divisor, 18, 145 
Group (s), 28 

abelian, 28, 109, 203, 204 

alternating, 80, 256 

automorphism(s) of, 66, 67 

of automorphisms, fixed field of, 238 

of automorphisms of K over F, 239 

center of, 47, 68 

commutative, 28 

cyclic, 30, 38, 49 

dihedral, 54, 81 

direct product of, 103 

factor, 52 

finite, 28 

Galois, 237 

generator of cyclic, 48 

homomorphism (s) of, 54 

of inner automorphisms, 68 

isomorphic, 58 

isomorphism (s) of, 58 

nilpotent, 117 


order of, 28 

of outer automorphisms, 70 
permutation, 75 
quaternion emits, 81 
quotient, 52 
simple, 60 
solvable, 116, 252 

symmetric, 28, 75, 241, 253-257, 284 


Hall, 119 
Halmos, 206, 354 
Hamilton, 124, 334, 356 
Hardy, 378 
Hermite, 216, 218 

Hermitian adjoint, 318, 319, 322, 336, 
339, 340 

Hermitian linear transformation, 336, 
341 

Hermitian matrix, 319, 322, 336 
Hexagon, regular, 232 
Higher commutator subgroups, 252, 253 
Hilbert, 216, 377 
Horn (U, V ), 173 

Homogeneous equations, linear, 189, 190 
Homomorphism (s), 54, 131 
of groups, 54 
kernel of, 56, 131 
of modules, 205 
of rings, 131 
of vector-spaces, 173 
Hurwitz, 216, 356, 373 


(i,j) entry, 277 
Ideal(s), 133, 134, 137 
left, 136 
maximal, 138 
prime, 167 
principal, 144 
radical of, 167 
right, 136 
Idempotent, 268 
Identity(ies) 

Lagrange’s, 373 
Newton’s, 249 
Identity element, 27, 28 
Identity mapping, 11 
Image, 11 

inverse, 12, 58 
of set, 12 

Independence, linear, 177 
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of H in G , 41 
of nilpotence, 268, 294 
Index set, 5 
Inequality 
Bessel, 200 
Schwartz, 194 
triangle, 199 

Inertia, Sylvester’s law of, 352 
Infinite set, 17 
Inner automorphism (s), 68 
group of, 68 
Inner product, 193 
Inner product spaces, 191, 337 
Integer (s), 18 
algebraic, 215 
Gaussian, 149 
partition of, 88 
relatively prime, 19 
Integers modulo n, 22, 23 
Integral domain, 126 
characteristic of, 129, 232, 235, 237 
Integral quaternions, 371 
Internal direct product, 106 
Internal direct sum, 174, 175 
Intersection of sets, 3, 4 
Invariant construction (or proof), 187 
188 

Invariant subspace, 285, 290 
Invariants 

of finite abelian group, 111 
of nilpotent linear transformation, 296 
Inverse element, 28 
Inverse image, 12, 58 
Inverse of mapping, 15 
Invertible linear transformation, 264 
Irreducible elements, 163 
Irreducible module, 206 
Irreducible polynomial, 156 
Irreducible set of linear transformations 
291 

Isomorphic groups, 58 
Isomorphic rings, 133 
Isomorphic vector spaces, 173 
Isomorphism 
of groups, 58 
of modules, 205 
of rings, 133 
of vector spaces, 173 

Jacobson, 355, 367 
Jacobson’s lemma, 316, 320 


Jacobson’s theorem, 367 
Jordan block, 301 

Jordan canonical form, 299, 301, 302 
Kaplansky, 259 

Kernel of homomorphism, 56, 131 

Lagrange, 40, 356, 371 
Lagrange’s identity, 373 
Lagrange’s theorem, 40, 375 
Law(s) 

associative, 14, 23, 27, 28, 36 
cancellation, 34 
commutative, 23 
distributive, 23, 121 
of inertia, Sylvester’s, 352 
Sylvester’s, 352 

Least common multiple, 23, 149 
Left coset, 47 

Left-division algorithm, 37.3 
Left ideal, 136 
Left-invertible, 264 
Lemma 

Gauss’, 160, 163, 164 
Jacobson’s, 316, 320 
Schur’s, 206 
Length, 192, 193 
Lindemann, 216 
Linear algebra, 260 
Linear combination, 177 
Linear equations 

determinant of system of, 330 
rank of system of, 190 
Linear functional, 187, 200 
Linear homogeneous equations, 189, 190 
Linear independence, 177 
Linear span, 177 
Linear transformation (s), 26 
algebra of, 261 
decomposable set of, 291 
determinant of, 329 
elementary divisors of, 308, 309, 310 
Hermitian, 336 
invariants of nilpotent, 296 
invertible, 264 
irreducible set of, 291 
matrix of, 274 
nilpotent, 268, 292, 294 
nonnegative, 345 
normal, 342 
positive, 345 
positive definite, 345 
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Linear transformation(s) ( continued) 
range of, 266 
rank of, 266 
regular, 264 
ring of, 261 
singular, 264 
trace of, 314 

Linearly dependent vectors, 177 
Liouville, 216 

Little Fermat theorem, 44, 366 

McCoy, 169 
McKay, 87, 119 
Maclane, 25 
Mapping (s), 10 
composition of, 13 
equality of, 13 
identity, 11 
inverse of, 15 
one-to-one, 12 
onto, 12 
product of, 13 
restriction of, 17 
set of all one-to-one, 15 
Matrix(ces), 273 
column of, 277 
companion, 307 
determinant of, 324 
diagonal, 282, 305 
Hermitian, 319, 322, 336 
of a linear transformation, 274 
orthogonal, 346 
permutation, 284 
real symmetric, 347 
row of, 277 
scalar, 279 
skew-symmetric, 317 
theory of, 260, 273 
trace of, 313 
transpose of, 316 
triangular, 284, 286 
unit, 279 

Maximum ideal, 138 
Minimal polynomial, 211, 264 
Module(s), 201 
cyclic, 202 
difference, 202 
direct sum of, 202 
finitely generated, 202 
fundamental theorem on finitely gen¬ 
erated, 203 

homomorphism (s) of, 205 


irreducible, 206 
isomorphism of, 205 
order of element in, 206 
quotient, 202 
rank of, 203 
unital, 201 
Modulus, 22 
Monic polynomial, 160 
Morgan rules, De, 8 
Motzkin, 144, 169 
Multiple, least common, 23, 149 
Multiple root, 233 
Multiplicative system, 142 
Multiplicity 

of a characteristic root, 303 
of a root, 220 
Mutually disjoint, 5 

n x n matrix (ces) over F, 278 
algebra of all, 278, 279 
^-variables 

field of rational functions, 241 
polynomials in, 162 
ring of polynomials in, 162 
Newton’s identities, 249 
Nilpotence, index of, 268, 294 
Nilpotent group, 117 
Nilpotent linear transformation, 268, 
292, 294 

invariants of, 296 
Niven, 216, 259 
Non-abelian, 28 
Nonassociative ring, 121 
Nonnegative linear transformation, 345 
Nontrivial subgroups, 38 
Norm, 193 

Norm of quaternion, 372 
Normal extension(s), 244-248 
Normal linear transformation, 342 
Normal subgroup(s), 49 
Normalizer, 47, 84, 99, 361 
nth root of unity, primitive, 249 
Null set, 2 
Number(s) 

algebraic, 214-216 
constructive, 228-230 
prime, 19 

transcendental, 214 

Odd permutation, 78, 79 
One-to-one correspondence, 15 
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Positive 
definite, 345 

linear transformation, 345 
Prime 

primitive root of, 360 
relatively, 19, 147 
Prime element, 146, 163 
Prime ideal, 167 
Prime number, 19 
Primitive nth root of unity, 249 
Gram- Primitive polynomial, 159, 163 
Primitive root of a prime, 360 
Product 

Cartesian, 5, 6 
direct, 103 
dot, 192 
inner, 193 
of mappings, 13 
Projection, 11 
Proper subset, 2 

Quadratic forms, real, 350 
Quadratic residue, 116, 360 
Quaternions, 81, 124, 371 
adjoint of, 372 

group of quaternion units, 81 
integral, 371 
norm of, 372 
Quotient group, 52 
Quotient module, 202 
Quotient ring, 133 
,-q Quotient space, 174 

Quotient structure, 51 
Quotients, field of, 140 


One-to-one mapping(s), 12 
set of all, 15 
Onto mappings, 12 
Operation, closure under, 27 
Order 

of an element, 43 
of an element in a module, 206 
of a group, 28 

Orthogonal complement, 195 
Orthogonal matrices, 346 
Orthogonalization process, 
Schmidt, 196 

Orthonormal basis, 196, 338 
Orthonormal set, 196 
Outer automorphism, 70 
group of, 70 


/>-Sylow subgroup, 93 
Pappus’ theorem, 361 
Partitions of an integer, 88 
Pentagon, regular, 232 
Perfect field, 236 
Period of an element, 43 
Permutation 
even, 78, 79 
groups, 75 
matrices, 284 
odd, 78, 79 
representation, 81 
representation, second, 81 
Perpendicularity, 191, 195 
phi-function, Euler, 43, 71, 227, 
Pigeonhole principle, 127 
Pollard, 259 
Polynomial (s) 

characteristic, 308, 332 
content of, 159, 163 
cyclotomic, 250, 362 
degree of, 152, 162 
division algorithm for, 155 
irreducible, 156 
minimal, 211, 264 
mpnic, 160 
in n- variables, 162 
over ring, 161 
over rational field, 159 
primitive, 159, 163 
ring of, 161 
roots of, 219 
symmetric, 243, 244 
value of, 209 


72-module, 201 
unital, 201 

Radical of an ideal, 167 
Radicals, solvable by, 250-256 
Range of linear transformation, 266 
Rank 

of linear transformation, 266 
of module, 203 

of system of linear equations, 190 
Rational canonical form, 305, 306, 
308 

Rational functions, 162, 241 
field of, 162, 241 
symmetric, 241 
Real quadratic forms, 350 
Real quaternions, 81 
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Real symmetric matrix, 347 
Real vector space, 191 
Reflexivity of relations, 6 
Regular hexagon, 232 
Regular linear transformation, 264 
Regular pentagon, 232 
Regular septagon, 232 
Regular 15-gon, 232 
Regular 9-gon, 232 
Regular 17-gon, 232 
Relation(s) 
binary, 11 
equivalence, 6 
reflexivity of, 6 
symmetry of, 6 
transitivity of, 6 
Relatively prime, 19, 147 
Relatively prime integers, 19 
Remainder theorem, 219 
Representation, permutation, 81 
second, 81 

Residue, quadratic, 116, 360 
Resolution, spectral, 350 
Restriction of mapping, 17 
Right coset, 40 
Right ideal, 136 
Right invertible, 264 
Ring(s), 120 
associative, 121 
Boolean, 9, 130 
commutative, 121 
division, 126, 360 
Euclidean, 143, 371 
homomorphisms of, 131 
isomorphisms of, 133 
of linear transformations, 261 
nonassociative, 121 
polynomial, 161 
of polynomials, 161 
of polynomials in n-variables, 162 
quotient, 133 

of 2 x 2 rational matrices, 123 
unit in, 145 
with unit element, 121 
Root(s), 219, 232 

characteristic, 270, 286-289 
multiple, 233 
multiplicity of, 220, 303 
of polynomial, 219 
Row of matrix, 277 
Rule, Cramer’s, 331 
Rule, De Morgan’s, 8 


Samuel, 169 
Scalar (s), 171 
Scalar matrices, 279 
Scalar product, 192 
Schneider, 216 
Schur’s lemma, 206 
Schwarz’ inequality, 194 
Second dual, 188 

Second permutation representation, 
81 

Secular equation, 332 
Segal, 119 
Self-adjoint, 341 
Separable element, 236 
Separable extension, 236 
Septagon, regular, 232 
Set(s), 2 

of all one-to-one mappings, 15 
of all subsets, 12 
difference, 5 
disjoint, 4 
empty, 2 

image under mapping, 12 
index, 5 
infinite, 17 

of integers modulo n, 22, 23 
intersection of, 3, 4 
null, 2 

orthonormal, 2 
theory of, 2 
union of, 3 
Siegel, 216, 259 

Signature of a real quadratic form, 
352 

Similar, 285 
Similarity class, 285 
Simple extension, 235, 236 
Simple group, 60 
Singular, 264 

Singular linear transformation, 264 
Skew-field, 125 
Skew-Hermitian, 341 
Skew-symmetric matrix, 317 
Solvable group, 116, 252 
Solvable by radicals, 250-256 
s Space (s) 

complex vector, 191 
dual, 184, 187 
inner product, 191, 337 
quotient, 174 
real vector, 191 
vector, 170 
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Span, linear, 177 
Spectral resolution, 350 
Splitting field, 222-227, 245 
Straightedge and compass, construction 
with, 228 
Subgroup (s), 37 

commutator, 65, 70, 117, 252, 253 
conjugate, 99 
cyclic, 38 

generated by a set, 64 
higher commutator, 253 
left coset of, 47 
nontrivial, 38 
normal, 49 
p-Sylow, 93 
right coset of, 40 
trivial, 38 
Subgroup of G 
characteristic, 70 

commutator, 65, 70, 117, 252, 253 
generated by a set, 64 
Submodule, 202 
Subset (s), 2 
diagonal, 6 
proper, 2 

restriction of mapping to, 17 
set of all, 12 
Subspace, 172 
annihilator of, 188 
cyclic, 296, 306 
invariant, 285, 290 
Sum 

direct, 202 
external direct, 175 
internal direct, 174, 175 
Sylow, 62, 87, 91 
Sylow’s theorem, 62, 91-101 
Sylvester’s law of inertia, 352 
Symmetric difference, 9 
Symmetric functions, elementary, 242 
243 

Symmetric group(s), 28, 75, 241, 253- 
257, 284 

Symmetric matrix, 317 
Symmetric polynomial, 243, 244 
Symmetric rational functions, 241 
field of, 241 

Symmetry of relations, 6 
System, multiplicative, 142 
System of linear equations, 189, 190 
determinant of, 330 
rank of, 190 


Theorem 

of algebra, fundamental, 337 
Brauer-Cartan-Hua, 368 
Cauchy’s, 61, 87 

Cayley-Hamilton, 263, 309, 334 
335 

Cayley’s, 71, 262 
Desargues’, 361 
Fermat, 44, 152, 366 
four-square, 371 
Frobenius’, 356, 359 
Jacobson’s, 367 
Lagrange’s, 40, 356, 375 
little Fermat, 44, 366 
Pappus’, 361 
remainder, 219 
Sylow’s, 62, 91-101 
on symmetric polynomials, 244 
unique factorization, 20, 148 
Wedderburn’s, 355, 360, 376 
Wilson’s, 116, 152 
Theory 

Galois, 237-259 
matrix, 260, 273 
set, 2 

Thompson, 60 
Trace, 313 

of a linear transformation, 314 
of a matrix, 313 
Transcendence 
of e, 216 
of 7T, 216 

Transcendental number (s), 214 
Transformation (s) 
algebra of linear, 261 
Hermitian linear, 336, 341 
invariants of nilpotent linear, 296 
invertible linear, 264 
linear, 261 

nilpotent linear, 268, 292, 294 
nonnegative linear, 345 
normal linear, 336, 342 
range of linear, 266 
rank of linear, 266 
regular linear, 261 
singular linear, 264 
unitary, 336, 338 
Transitivity of relations, 6 
Transpose, 313, 316 
of a matrix, 316 
Transpositions, 78 
Triangle inequality, 199 
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Triangular form, 285 
Triangular matrix, 284, 286 
Trisecting an angle, 230 
Trivial subgroups, 38 

Union of sets, 3 

Unique factorization domain, 163 

Unique factorization theorem, 20, 148 

Unit in matrix algebra, 279 

Unit in ring, 145 

Unital i?-module, 201 

Unitary transformation, 336, 338 

Unity, primitive nth root of, 249 

Value of polynomial, 209 
Van Der Waerden, 259 
Vandiver, 362 
Vector (s), 171 
characteristic, 271 
linearly dependent, 177 
Vector space(s), 170 
complex, 191 


homomorphism of, 173 
isomorphism of, 173 
real, 191 

Waerden, Van Der, 259 
Waring problem, 377 
Wedderburn, 355, 356, 360 
Wedderburn’s theorem, 355, 360, 376 
Weisner, 259 
WlELANDT, 92 
Wilson’s theorem, 116, 152 
Wright, 178 

Zariski, 169 
Zero-divisor, 125 
Zero-matrix, 279 

15-gon, regular, 232 

9-gon, regular, 232 

17-gon, regular, 232 

2x2 rational matrices, ring of, 123 






